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ABSTRACT 

Methods are presented (1) to partition or decompose a visual 
scene into the bodies forming it* (2) to position these bodies in 
three-dimensional space, by combining two scenes that Mfc* a 
stereoscopic pair; (3) to find the regions or zones of a visual 
scene that belong to its background] (4) to carry out the isolation 
of objects in (1) when the input has inaccuracies. computer 

programs implement the methods, and many enables illustrate their 
behavior. The input is a two-dimensional line-drawing of the scene, 
assumed to contain three-dimensional bodies possessing flat faces 
(polyfaedra)| some of them may be partially occluded. Suggestions 
are made for extending the work to curved objects. Some comparisons 
are made with human visual perception. 

The main conclusion is that it is possible to separate a picture 
or scene into the constituent objects exclusively on the basis of 
monocular geometric properties (on the basis of pure form); in fact, 
successful methods are shown. 

Thesis Supervisor: Marvin L. Minsky. 

Title: Professor of Electrical Engineering. 
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if the machine is asked to separate the bodies, it must say 

(BODIES ARE AS FOLLOWS : (189) (2 7) (3 5 6) (10 15) 

(4 13 14) ) 

If asked to report the triangular prisms, it should answer 
(10 15 IS A TRIANGULAR PRISM) 

This thesis discusses the problems involved in this task. 

What should be done when the information la noisy, some lines 
are missing, etc? 

Bow can the computer separate the background from the objects 
forming the scene? 

How should shadows be handled? 

How can stereoscopic vision be used? 

What about ambiguities and optical illusions? 

This thesis also discusses some related aspects of human 
visual perception 

Key words and phrases related to this study are as follows: 


artificial intelligence 
body 

b&ckgtOund 

background discrimination 
classification of images 
CONVERT 
cybernetics 
feature recognition 
geometric objects 
geometric processing 
graphic processing 
graphical coenunlcation 
graphical data 
heuristic procedures 
heuristic programming 
identification 
image 

intelligence 
line drawing 
LISP 

list processing 
machine aided cognition 
machine perception 
mechanisation of visual 
perception 

object identification 
optical 

optical illusion 
pattern 


patten matching 
patten recognition 
photography 
photo-interpretation 
picture 

picture abstraction 

picture processing 

picture trassformations 

pictorial structures 

polyhedra 

recognition 

robot 

scene 

scene analysis 
solids 

stereoscopic 
symbol manipulation 
three-dimensional 
three-dimensional scenes 
threa-dimensional solids 
two-dimensional patterns 
vision 
visual 

visual information processing 
visual dbject recognition 
visual perception 
visual seenas 
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== Computer Review (A. C. M.) index numbers: C.R. 3.61, 3.63, 

4.22, 5.20. 

Why this work was chosen as a thesis topic 

—. i.i . . I, — . ii i The present work was 

carried out using the facilities of the Artificial Intelligence Group 

of Project MAC, at M. I. T. Currently, the main goal of the 

Artificial Intelligence Group (AI group) is «to extend the way 

computers can interact with the real world: specifically to develop 

better sensory and motor equipment, and programs to control them.^> 

{Minsky, Status Report II}. From such efforts, a robot or mechanical 

manipulator has been constructed, consisting of a PDP-6 computer, 

an image dissector camera mechanical arm and hand (see pictures). 


IMAGE DISSECTOR CAMERA 



<£These "eyes and hands" are eventually to be able to do reasonably 
intelligent things but first, of course, it is difficult enough to 


get them to do things that are easy for people to do.» {Ibid. }■ 


An image dissector 
silently watches 
a triangular prism 
in the vision labo 
ratory of the A.I. 
Group. 
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The work was naturally divided into visual infonnatlon processing 
(computer vision) and manipulation and control of the arm-hand. 

Thus, when I came as a graduate student from the Politecnico de Mexico 
to M. I. T. (Sept. 65) and became associated with the AI Group, I 
found a great interest there in graphical communication with computes. 
Moreover, it was felt that symbol manipulation techniques would be 
relevant to this area. I was fortunate enough to have had some con¬ 
tact with the LISP language in some of its implementations: 

MB - LISP {McIntosh 1963} * and Hawkinson-Yates- LISP {Hawkinson 64}* 
at the Centro Nacional de Calculo of the Politecnico; in fact, I 
became interested in the area because I felt that it would be possible 
to handle two-dimensional structures much in the same fashion as one 
handles lists (that is, one-dimensional structures or strings of 
symbols) in a pattern-driven language, such as CONVERT {1965}, recently 
finished at that time. 

The area also offered a good opportunity to understand and 
evaluate several techniques, computers, equipment, etc. Consequently 
I decided to work in it. 


^ ^ The parentheses { } always Indicate a reference to the 

bibliography at the end of this thesis, where the complete title, 
date, etc., of the paper can be found. 
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SIMPLIFIED VIEW OF SCENE ANALYSIS 

-TO THE BUSY RF-AngR - 

This section presents a general view of the problems 
in the thesis and their solutions; if you are short of time, 

(1) Read the abstract and this section. 

(2) Choose some scenes from section 'Analysis of many scenes', 
and observe how the computer perceives them. 

(3) Look through the table of contents, select additional topics. 


Scene Analysis 

— — —Am Scene analysis is the result of interaction between 
optical data coming from the Eye, and knowl edge about the visual world 
stored in the programs. In all that follows, the optical data entering 
through the Eye is reduced to a line drawing; this pass is called 
pre-processing , and it will be only 

After preprocessing, such a 
line drawing is analyzed in order 
to discover and recognize given 
objects in it. The process is 
called recognition . 

This thesis is concerned 
with recognition. 

We now give a simplified exposition of both processes. Recognition 
will be discussed abundantly in the remainder of this thesis, since 
it is the main topic; readers who wish for more Information on pre¬ 
processing or other approaches should consult the references, for 
instance {my MS Thesis} and {A C Shaw FJCC 68}. See also page go . 


briefly sketched here. 


The stylized presentation that 
follows is only an example I in 
particular, scene analysis does 
not need to follow the sequence 
pre-processing recognition. 

See ’Division of work in 
Computer Vision' in page fco . 
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Each inhomogeneous square @ is divided in four BB , ignoring 
again the homogeneous sub-squares. 



The process is repeated a few times more. 
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The squares are now 




;ed to lines and vertices. 
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What follows is merely a brief summary of the processes in 
recognition. A more systematic presentation and classification of 
processes in recognition is found in 'Division of work in Computer 
Vision', on page 60. 

A program would check in the original scene, on both sides of 


each line, for continuation across the line, of textures, local cracks, 
etc. On these and other grounds, shadows would be picked up and 
erased: 
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A line-proposer program studies the abstract or "symbolic" scene and 
using same heuristics and general principles, proposes places where 
it is quite probable that a line is missingi 



These places are searched by a line-verifying program, which is an 
specially sensitive test that uses fine measurements from the ori¬ 
ginal scene, and often it will pick up a boundary that was missed 
in the less-intelligent homogeneity phase. Here it can be practical 
to apply a very strict and sensitive test, because the program 
knows very accurately where the line should be, if it really exists 
at all. Vox example, even if the two faces have almost equal illu¬ 
mination the Eye can pick up a thin, faint highlight from the edge 
of the cube. It would have been hopelessly expensive to look for 
such detailed phenomena over the whole picture at the start. 





At this stage our program SEE (page 58) comes 
into action. This program treats different kinds of local 
configurations as providing different degrees of evidence 
for 'linking' the faces. This evidence is obtained mainly 

at vertices, and at boundaries between regions. 

A vertex is in general a point of intersection of 
two or more boundaries of regions. These regions might or 
might not be faces of a single body. SEE examines the 

configuration of lines meeting at the vertex to obtain 
evidence relevant to whether the regions involved belong 
to some object. 

For instance, in the vertex configurations "ARROW" and 
“FORK"(a complete classification of vertices can be found 
below in table 'VERTICES'), 



"FORK" "ARROW" 
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c to a. 


the'' wRk 1 ' suggests linking face a to face b, b to c, 

The "ARROW" links a with b. A "leg" (which depends on nearly 
parallel lines) would add a weak link, in addition to the ordinary 



'LEG' Matching T's. 

(Weak link shown dotted) (two strong links) 

(br strong) link placed by its 'arrow '; a "T" looks for a matching 
"T", and if found, two strong links are placed as shown. Also, a 
"T" counts against (inhibiting, that is) linking a with c, or 
b with c. ^ 

These links, for our example, are 



and may be represented as 
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indicating two groups of linked faces, that is, two bodies: 


(BODY 1. IS 12 4) 

(BODY 2. IS 35 6) 

If in addition we give at this point to 
the computer the definition or concept 
of a 'triangular prism', through an ab¬ 
stract model of it {my MS Thesis}, we 
can get 

(12 4 IS A TRIANGULAR-PRISM) 

(3 5 6 IS A CUBE) 


Recognition has finished. 


Analysis of several examples 

A larger variety of kinds of evidence is used in more complicated 
scenes, making the program more intelligent in its answers: 

(1) The links themselves are inhibited by conditions or configurations 
at the neighbor vertices and faces; for Instance, in tin case 
of a "FORK", the (strong) links indicated below are inhibited: 



(2) The links to the background are Ignored [complete descriptions 
of conditions for producing and cancelling links are to be 
found in section 'SEE, a program that finds bodies In a scene']. 

(3) A hierarchical scheme is used that first finds subsets of faces 
that are very tightly linked (e. g., by two or more links). 
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These "nuclei" then compete for more loosely linked feces 
(faces linked through one week link end one strong link » 

or one face completely unlinked , except by one strong link- -O ). 

By not considering e single link, week or strong, es enough 
evidence for assigning two feces as pert of the seme object, this 
algorithm requires two "mistakes" (that Is, two careless place¬ 
ments of links between regions that should not be considered as 
forming the same body) to make an Identification error. 

The bodies of the following scenes are found by SEE without 
difficulty. 



Rote that of the strong links available to the "FORK" marked with 
an arrow, two were prohibited or inhibited and only one Is produced 
by SEE. 
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In the following figure, the "FORK" of the big object is missing. 



Region (definition). Surface bounded by simply closed curves. 

We will consider the outer background (:16 in fig 'LIO', page 59 ) 
to be also a region. 

Nucleus (definition). A nucleus (of a body) is a set of regions. 
Linked nuclei (definition). Two nuclei a and B are linked if 
regions a and b are linked where a c A and b £ B. 

First rule • If two nuclei are linked by two or more strong links, 
they are merged into a larger nucleus. 

For instance , regions :8 and :13 are put together, because there 

-» © 3 ® 

exist two strong links among then, to form the nucleus :8—11. 

Maximal nuclei : Starting from nuclei containing individual regions, 
we let tho nuclei grow and merge under the First rule, until no new 
nuclei can be formed. When this is the case, the scene has been 
partitioned into several "maximal" nuclei; between any two of these 
there is at most one strong link. 

For instance, regions :8 and til are put together by the First 
rule; now we see that region :4 has two links with nucleus :8-ll, 
and therefore the new nucleus :8-11-14 is formed. This last is a 


maximal nucleus. 



name of a region is a number preceded by a colon, such as:l6. 
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The First rule is applied again and again, until all nuclei are 
maximal nucleij then the following rule is applied: 

Second Rule : If nuclei A and B are joined by a strong and a weak 
link they are merged Into a new nucleus. ___ 


The Third rule is applied after the Second rule. 

Third Rule : If nucleus A consists of a single region, has one link 
with nucleus B and no links with any other nucleus. A and B are merged. 


(10 11) does not join the bigger nucleus because (10 11) does not 
consist of a single region. Below, 9 does not join (7 8) or (4 5) 
because 9 has two links: /4^ 


The Third rule tends to avoid proposing bodies consisting of a 
single region. 



Here three links were erroneously placed but SSE did not get 
confused by them. 

In complicated scenes, coincidences cause two objects to line up. 
As a result, vertices of different objects are merged, two objectively 
different lines appear as one and so on. The next example Illustrates 
these phenomena and shows how SEE copes with the problem. 
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SEE transforms the above scene as follows: 





As we see, the nuclei are going to be correctly formed, and SEE will 
also analyze this scene correctly. 

"he bodies do not need to be rectangular, prismatic, convex. They 
only need to be rectilinear. As we will aee later, even curved objects 
may be identified, under certain restrictions (cf. Table ’ASSUMPTIONS'), 



Figure 'BRIDGE' 
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All the bodies in "BRIDGE" are adequately found. A new heuristic is 
used here: 



three parallel lines comprising regions that are not background, and 
having the background as a neighbor, and a 'T' in the center line, 
originate a strong link, as shown above. 

The following locally ambiguous scene is correctly parsed by 
our program: 




If we add another block to the right, the program makes a mistake and 
fails to see one of the inner cubes: 



Figure ’MOMO' also gets decomposed accurately: 



Figure 'MOMO' 





The local links allow correct Identification of the following body! 



If the lateral faces do not have parallel edges, a mistake occurs 
(conservative behavior, page 2 * 2 .); 





Another mistake occurs in the following scene: 



Conclusion 


The performance of this program shows that it is possible to 
separate a scene into the objects forming it, without needing to know 
the objects in detail; SEE does not need to know the 
'definitions' or descriptions of a pyramid, or a pentagonal prism, 
in order to isolate these objects in a scene containing them, even in 
the case where they are partially occluded. 

The program will be fully analysed in the following pages. 
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* 


Problems In analyzing a visual scene 

The problem of taking a two-dimensional image (or several such 
images), and constructing from it a three-dimensional interpretation, 
involves many operations that have never been studied, to say nothing 
of being realized on a computer. We will list some of these here; 
a more complete list is found in my H.S. Thesis {MAC TR 37}; some 
have been side-stepped or ignored by the present recognition system; 
the problems which we did solve are discussed in the text. 

Among the facilities that must be available are: 

a) Spatial frame-of-reference : setting up a model of the relation 
between the eye(s) and the general framework of the physical task, 

i. e., where are the background, the "table" or working surface, 
and the mechanical hand(s)? 

b) Finding visual objects , and localizing them in space with respect 
to the eye-table-background-hand model. 

c) Recognizing or describing the objects seen , regardless of their 
position, accounting for partly-hidden objects, recognizing objects 
already "known" by descriptions in memory and representing the 
three-dimensional form of new objects. 

d) Rni1 ding an internal "structural model" of what has been seen, 
for the purpose of task-goal analysis. 

Among the important factors are the effects of: 

1. Both the camera*s focus and its depth-of-focus . 

2. T1 limi'i nation of the objects . Light affects the appearance of 
objects in obvious and subtle ways -- in scenes with multiple 
objects and lights we get complicated shadows, which have to 
be detected or rejected. The boundary between two faces may 
disappear if they get equal illumination from a diffuse light source. 

3. Perspective and distance effects. Even for geometric objects with 
flat surfaces, the two-dimensional projection of their surface 

* Adapted from Status Report II {Minsky 67). See also Project MAC 
Progress Report {1967, 1968). 
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features can. take many forms, and the system has to be able to deal 
with all of them. It works both ways, of course: once identified, 
the appearance can give valuable information about the object's 
orientation, size, and even (under some conditions) its absolute 
spatial locations {Roberts 1963}. 

4. Accidental vs. essential visual features . Two objects of the same 
shape and location can have very different visualpresentations 
because of their surface textures and markings. We need to 
distinguish these two-dimensional "decorations" from real three- 
dimensional spatial features. 


Other projects 


Here are the main robot groups at a panel discussion. 



problems in the 
implementation of 
intelligent robots 


This session, the second of three sessions on robotry, will 
consist of a panel discussion among technical people in¬ 
volved in the design and construction of mechanical de¬ 
vices that are capable of significant independent “intelli¬ 
gent” behavior, usually by means of computer control. The 
projects represented on this panel have drawn upon state- 
of-the-art capabilities in many technologies including 
mechanical engineering, pattern recognition, heuristic pro¬ 
gramming, neural networks and computer systems. Thus, 
the discussion which will be conducted at a fairly technical 
level should be of interest to engineers and scientists con¬ 
cerned with the problems of interfacing a variety of disci¬ 
plines, as well as to those interested in learning about the 
nature of current embryonic “robot” systems. 

NOTE: Tickets priced at $5.00 each (including lunch) for 
the all-day tour of "live robot” installations on Wednesday, 
Dec. 11th, will be available at this session. 


1968 
fall joint 
computer 
conference 


Panel Members 

MR. L. CHAITIN 

Artificial intelligence Group 
Stanford Research Institute 


DECEMBER 9 - 10-11 
san francisco 
civic center 


ROBOT STUDIES AT STANFORD RESEARCH 
INSTITUTE 


PROF. J. A. FELDMAN 
Computer Science Department 
Stanford University 

THE ROBOT PROJECT 
AT STANFORD UNIVERSITY 
DR. T. SHERIDAN 

Dept, of Mechanical Engineering 
MIT 

HUMAN CONTROL OF REMOTE COMPUTER 
MANIPULATORS 

MR. R. J. LEE 

Air Force Avionics Lab. 

Wright-Patterson AFB 

GENERAL PURPOSE MAN-LIKE ROBOTS 

PROF. S. PAPERT 

Artificial Intelligence Project 
MIT, Project MAC 

THE MIT HAND-EYE PROJECT 

MR. L. SUTRO 

Dept. Aeronautics and Astronautics 
MIT 

ROBOT DEVELOPMENT AT THE 
MIT INSTRUMENTATION LABORATORY 
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RELATED RESEARCH 


Previous work by the author 


A programming language is described which is applicable to 
problems conveniently described by transformation rules. By 
this is meant that patterns may be prescribed, each being 
associated with a skeleton, so that a series of such pairs may 
be, searched until a pattern is found which matches an expres¬ 
sion to be transformed. The conditions for a match are governed 
by a code which also allows subexpressions to be identified and 
eventually substituted into the corresponding skeleton. The 
primitive patterns and primitive skeletons are described, as 
well as the principles which allow their elaboration into more 
complicated patterns and skeletons. The advantages of the 
language are that it allows one to apply transformation rules 
to lists and arrays as easily as strings, that both patterns one) 
skeletons may be defined recursively, and that as a consequence 
programs may be stated quite concisely. 

Abstract of Convert paper in Com. A.C.M. 

Because it is easy to write and modify a program in Convert, 
the language has been extensely used to quickly teat 'good' 
and "great" ideas, new algorithms, etc. It is embedded in 
the LISP of the PDP-6 computer (A.I. Group), in the IBM-7094 
(ProjectMAC-MIT); in the CDC-3600(Uppsala University, Sweden), 
in the SDS-940 (Univ. of California, Berkeley). A paper in the 

A. C. M. and {MAC M 305> describe the language; examples of 
simple programs written in Convert are in {MAC M 346}; a book 
article {Patterns and Skeletons in Convert} is oriented 
toward the Lisp consumers. For our Spanish readers, two 
Bachelor's Theses {Gusman 1965} {Segovia 1967} describe the 
language and processors, and give examples. 


SCENE ANALYSIS 

(1) Polybrick -{MAC M 308} {Hawaii 69} is a Convert program that 
works on a scene or picture, expressed as a line drawing, and finds 
parallelepipeds in it. 
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(2) We would like to be able to specify In soae suitable notation 
models of the classes of objects we are Interested In (such as 'cube 1 , 
'triangular prism', 'chair'), and make a program look for all Instan¬ 
ces of any given model In a given scene or figure. Two arguments 
would have to be supplied to our program: the model of the object 

we are interested in, and the scene that we want to analyse. 

Programs to do this are described in {AFCKL-67-0133} and {MAC M 342}. 
In these early programs, partially occluded objects get Incorrectly 
Identified. These programs are also written In Convert, and work 
by transforming or compiling the model, written in a picture descrip¬ 
tion language. Into a Convert pattern, which searches the scene for 
instances of the model. 

(3) A Master's Thesis {MAC TR 37} discusses many ways to Identify 
objects of known forms. Different kinds of models and their proper¬ 
ties are analyzed. 

(4) It Is Important to be able to find the bodies that form a scene, 
without knowing their exact description or model. SEE is a program 
that works on a scene presumably composed of three-dimensional 
rectilinear objects, and analyzes the scene Into a composition of 
three-dimensional objects. Partially occluded objects are usually 
properly handled. This program was discussed in {MAC M 357}, 

{Guzman 7JCC 68} and {Pisa 68}, and this thesis discusses a later 
version. 

(5) The present thesis goes beyond these topics to discuss also 
handling of stereo Information (two views, left and right, of the 
same scene). Improvements to deal with noisy (Imperfect) Input, 
figure-background discrimination, and a few other subjects. 

Canaday 


Rudd H. Canaday in 1962 analysed scenes com¬ 
posed of two-dimensional overlapping objects, “straight¬ 
sided pieces of cardboard.” His program breaks the image 
into its component parte (the pieoea of cardboard), de¬ 
scribes each one, gives the depth of each part in the 
image (or scene), and states which parts cover which. 
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Roberts 


The problem of machine recognition of pictorial data has long been a 
challenging goal, but has seldom been attempted with anything more com¬ 
plex than alphabetic characters. Many people have felt that research on 
character recognition would be a first step, leading the way to a more gen¬ 
eral pattern recognition system. However, the multitudinous attempts at 
character recognition, including my own, have not led very far. The reason, 
I feel, is that the study of abstract, two-dimensional forms leads us away 
from, not toward, the techniques necessary for the recognition of three- 
dimensional objects. The perception of solid objects is a process which can 
be based on the properties of three-dimensional transformations and the 
laws of nature. By carefully utilizing these properties, a procedure has been 
developed which not only identifies objects, but also determines their orien¬ 
tation and position in space. 

Three main processes have been developed and programed in this report. 
The input process produces a line drawring from a photograph. Then the 
three-dimensional construction program produces a three-dimensional ob¬ 
ject list from the line drawing. When this is completed, the three-dimen¬ 
sional display program can produce a two-dimensional projection of the 
objects from any point of view. Of these processes, the input program is the 
most restrictive, whereas the two-dimensional to three-dimensional and 
three-dimensional to two-dimensional programs are capable of handling 
almost any array of planar-surfaced objects. £fro»* Roberts ^ 

Roberts in 1903 described programs that (1) con¬ 
vert a picture (a scene) into a line drawring and (2) pro¬ 
duce a three-dimensional description of the objects 
shown in the drawring in terms of models and their 
transformations. The main restriction on the lines is 
that they should be a perspective projection of the sur¬ 
face boundaries of a set of three-dimensional objects 
with planar surfaces. He relies on perspective and 
numerical computations, while SEE uses a heuristic and 
Symbolic (i.e., non-numerical ) approach. Also, SEE 
does not need models to isolate bodies. Roberts’ work is 
probably the most important and closest to ours. 


Mechanical Manipulator Groups (see also page 32 ) 


Actually, several research groups (at Massachusetts 
Institute of Technology, “ at Stanford University, 11 
at Stanford Research Institute u ) work actively to¬ 
wards the realisation of a mechanical manipulator, i.e., 
an intelligent automata who could visually perceive and 
successfully interact with its enviomment, under the 
control of a computer. Naturally, the mechanisation of 
visual perception forms part of their research, and im¬ 
portant work begins to emerge from them in this area. 
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THE CONCEPT OP A BODY 


In this section definitions of a body or object will be proposed. 

The criterion Is that they agree in general with the common use of 
the word 'body', while at the same time they should lead themselves 
to Implementation into a computer program. 

Introduction 

Our ultimate interest is to examine a two-dimensional scene (a 

picture, line drawing, or painting), presumably a representation 

(projection, photograph) of a three-dimensional scene (a subset of 

the "universe" or "real world") and to find in it objects or bodies 

contained in the real scene. More specifically, the aim is to find 

the two-dimensional representations (projections, photographs) of 

the different three-dimensional bodies present in the scene. 

The phrase "two-dimensional representation of a three- 
dimensional body" will be shortened to "two-dimensional 
body" or even to "body", when no confusion arises. 

That Is, we have to analyze a two-dimensional scene Into collections 
of two-dimensional entities (surfaces, regions, lines), each of \rfilch 
makes "three-dimensional sense" as a two-dimensional projection 
of a three-dimensional body. 

The problem is Inherently ambiguous 

A scene can be considered as a set of surfaces (faces or regions), 
a body belonging to that scene is then an "appropiate" subset of this 
collection. Therefore, the problem of finding bodies in a scene is 
equivalent to the problem of partitioning the set into approplste 
subsets, each one of them representing or forming a body (scene "CHBRCH"). 

The problem is Inherently ambiguous, since different collections 
of three-dimensional bodies can produce the same 2-dlm scene, therefore 
a given scene can be partitioned in many ways into bodies. 
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It is desired to make a 
"natural" partition or decompo¬ 
sition of the scene, natural In 
the sense that will agree with 
human opinion.* 

To define a three- 
dimensional body is no problem 
[a philosopher may disagree, 
perhaps in singular cases]: 



Figure ‘CHURCH' 

Set of eight elements. Adequate 
subsets (bodies) are [2 4], 
[135678]. In a more com¬ 
plicated example, people may 
differ in their parsing of scenes. 


Three-dimensional body (definition): 

A connected volume limited by a 
continuous, two-sided surface composed of 
portions of planes. 

Restriction: The above definition covers only polyhedral bodies, 
that is, those having flat faces. 

Restriction: No holes. 

No-restriction: Bodies do not need to be convex. 

Roughly speaking, a three-dimensional body is something that does not 
fall apart into pieces when lifted [this may be used as an operational 
definition of a body, given a mechanical manipulator to make the neces¬ 
sary tests]. 


Given a three-dimensional body, we generate a two-dimensional body 
by taking a picture of it, as follows. 

Tuo-rHn»»naional body (definition). Figure formed by the projection of 

a three-dimensional body. Generally, the projec¬ 
tions is isometric or perspective. 

Thus, this is a view in two dimension* of a solid body, from some 
particula r point of view. ; V 

Unfortunately, a two-dimensional body could Sams in this way from 
any of several different 3-dim bodies or, what is worse, two 3-dlm bodies 
together can give rise to a slnglm 2-dim body. For instance, in fig. "BENI", 
* Without such a requirement; the problem has a trivial solution 
(see Metatheorem in page 39)- 
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Figure 'B R U T' 

Two blocks, or a bent brick. 

this tro-RlfMpsional, body could be generated bya r, bept brick" or by 
two Meeks adjacent* to each other. We are dealing with one three- 
dimensional body In the first case, .v^. t^ seo^. ftst the v } 

2-<»» entity-(fcisseeiy, the drawing of figure 'Barr') is the sane, and 
we are c onf ron ted with an inherent aadftgulty. ~ 

iiMMOff ii - a wore strikinrexsagrla Is given in Fig. ’StiELIUS • , 
which could be the representation of 365 cylindrical bodies, or the 
picture of a sculpture (one body) in Helsinki^ ' 
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Such colorful contradictions point towards the need to lay down 
a more careful definition of our task. For instance, no one would think 


Fig. 'CUBE' 

No one would think... 


contains three bodies. Nevertheless (see fig. 1 PARALLELEPIPED 1 in 
next page), that could be the case. 

These two extremes are to be avoided by an appropiate definition 
of a body and the corresponding computer program. 



Legal 


scene 


That 2-dim scene in vdiich each line is boundary of some 


region. 


A 



A 


Legal scene. Illegal. Illegal. 

See also comments to scene R3, and 'Illegal Scenes' (page 2 . 17 ), in 
section 'On noisy input'. 

"Any legal scene can always be the projection of one or 
more three-dimensional objects." 

To prove it, it suffices to note that each legal scene is composed 
of regions , and each of them could be interpreted as the 

basis of a pyramid, all the faces meeting at the cuspid occluded by 
the basis. 

Therefore, each legal scene can be obtained by projecting or 
photographing an adequate arrangement of such pyramida. 


We can always construct a 
legal scene by photographing 
(or proj ecting) suitable 
3-dim polyhedra. 
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Figure 'PARALLELEPIPED' 

An Improbable decomposition of a scene. 
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Irivl a l partition gy^ U8e 0 f the metatheorem, we can always find a 
decomposition of a visual scene Into three-dimensional bodies; we 
call this answer "trivial". Humans do not split scenes this way. 

Our program should not, either. 

But the metatheorem points out that "impossible scenes" are ne¬ 
ver found among the legal scenes (see section 'On Optical Illusions'); 
these always have at least one interpretation. M 

We are trying to give criteria for proposing bodies that will 
suit our ends, which are to define a "reasonable" or "standard" body. 

This will permit us to judge the performance of a program designed 
to find objects In a scene. 

Several criteria are possible: 

1. Roberts {1963} suggests: given several models of three-dimensional 

bodies, use some numerical techniques, such as least squares 
fitting, to find which model fits best through a suitable 
transformation, and accept this match If the error Is tolera¬ 
bly small. Complicated compositions of elementary bodies 
are considered. 

2. Ledley {1962} would propose: in terms of suitable primitive components 

(arcs, legs, etc.), make a syntactical analysis of the scene, 
with the help of a grammar, in such a way that the models of 
the object you want to identify are formed recursively from 
these primitive components and (perhaps) other bodies. 
Naraslmhan {1962} and Klrsch -{1964} would agree on this 
lingulstical approach. A. C. Shaw {Ph. D. Thesis} assents. 

3. Guzman {1967} suggests: prepare models which specify a fixed 

topology but where other relations (length of sides, paralle¬ 
lism of two lines, equality of angles) are specified through 
the use of open variables (UAR variables. In CONVERT). 

Evans {1968} would agree with that. 

These approaches require the existence of a model which describes the 
object to be identified; the model specifies a particular 3-dim object 
(or a class of them). These approaches are answering more than what 
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was asked; they tell not only "yes, it Is a body”, but also 
"It Is a pyramid”. The current question is more general. 

It is desired to know if something is a body, any body, 
even one which has not been seen before. 

If it were possible to implement a program to answer that question, 
then that would be a working definition of a body. SEE is a program 
Which cosies close to this goal, so that it could be pragmatically stated: 

2-dim body "a la SEE" (definition). A body is each set of regions 

recognised by the program SEE as such. 

This definition allows the following 
Criticism: A perfect way to hunt Ilona is to 
capture any entity E, and to call 
that a lion, by definition. 

That is, although this definition is precise, SEE may make 
decisions "contrary to common sense”; also, for purposes of judging 
the behavior of the program, this definition is useless, since SEE 
will be perfect 100 per cent of the time, irrespective of its answers. 

We are*finally, tainted to conclude that ‘common sense', or 
better, "human commo n sense" plays a role in the definition of a body, 
since what we are trying to characterise is a usual beady , normal body , 
cnawtnn body , etc. But even people may differ in their parsings of 
scenes. We could, of course, give a scene (such as 'MOMO* in page77) 
to 100 subjects, ask them to identify the different bodies.in it, and 
come up with some sort of 'average' or 'general consensus': 

2-dim body (statistical and human-behavioral definition). Each one of 

the aubaets into which a scene la partitioned by many subjects. 
It is understood that, in this spirit, the human objects should be 
motivated to satisfy a 

Simplicity criterion : Of the several "reasonable" interpretations 

(decompositions) of a scene, the one which 
contains the smaller number of bodies ia 
preferable. 
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That is, an explanation or decomposition is simpler (and preferable) 
if it can be done with fewer parts. 

Simplicity is not to be achieved at any cost, since the parsing 
of the scene has to produce 'plausible 1 bodies, since "siapllcity" 
could be always achieved if each scene is reported as a single, 
gigantic body, obtained perhaps from more familiar ones through liberal 
use of adhesives (cf. also Sibelius' Monument), 


The chief choices are surely: 

**» To choose a parsing, or 

“ To list many (perhaps rank-ordered) in case of ambiguity. 

If we select the first alternative, further choices are 
““ to have a natural parsing (human). 

— to have a canonical parsing, in the sense of minimising 
some variable (the minimisation of the number of bodies 
leads us to Sibelius' Monument, its maximisation to the 
Trivial Solution of the metatheorem (page 41 J). 


Other kinds of 2~dl2& date 

■ SSSSmS^m & aa we have been discussing identification of 
3-dim bodies (through their 2-dim projections) in a 2-dim seana, 
purely on the bssls of geometric regions, Many other kinds of infor¬ 
mation could be used, such as texture, color, and shadows. 


Nevertheless, it is interesting 
to see how far the Identification 
of bodies can go if only geometric 
properties are used. 


Conclusion 


Finding bodies in a 2-dim scene Is a task not very £recisel^ 


Refined, because of the ambiguities Inherent in any projection process. 
On these grounds, the concept of 'body' is best described through 
familiarity, human opinion and consensus. We are forced to this because 
any scene could be partitioned in several ways (cf. fig, 'PARALLELEPIPED 1 ) 
only some of which may be considered plausible or 'sensible' (natural, 
common, standard) partitions In regard to the bodies forming it. 
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TOTAL ANALYSIS OF VERTICES 


ynOP * 1S Here a scene is considered as formed by several regions; 
bodies are adequate collections of regions. The problem of Identifying 
bodies is restated as the problem of finding whether two regions 
belong or do not belong to the same body. This question is answered 
by examining the vertices of the scene. 

It is shown that a single vertex never conveys conclusive 
evidence, so that at least a pair of vertices is required to Isolate a 
body; familiar and unfamiliar configurations of objects help to under¬ 
stand how the vertices are to be used in this task. 


Vertices_are the important feature 

All faces of polyhedra are bounded 
by edges. 

All edges terminate in vertices. 

-= This thesis deals with the analysis of visual scenes composed 
mainly by three-dimensional planar objects 



These are limited by flat surfaces 


All these bodies share as a common feature the edge: place where 
two planes [faces] meet (but see page 57 ). 




Wherever several edges or faces meet, a vertex appears. This is 
also a common feature for all the bodies. 



A body is formed by vertices with edges connecting some of these. 


When a 3-dim body ia projected into a 2-dim body, its 3-dim vertices 
(which we will call genuine 3<H*im vertices) are transformed into 
genuine 2-dim vertices, known as images of the 3-dim vertices, as 
figure 'GENUINE* (in next page) indicates. 

That is, a genuine 2-dim vertex has come from a genuine 3-dim 
vertex. Some 2-dim "false" vertices appear too; they do not come 


44 




appear. 


Figure 'GENUINE' 

A genuine vertex (such as 1 ) is one whose counterimage 
(Gj in this case) belongs t Q some body; a false vertex 
euch as F 2 ', is a virtual intersection, and generally 
has no counterimage in the 3-dim world. See fig. 'NODES'. 
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from genuine 3-dim vertices, but rather from the partial occlusion 
of parts of opaque bodies [transparent objects give rise to different 
kind of false vertices; Gusman (MS Thesis} deals with them by using 
transparent models, and a mode of operation of TD, the recognizer, 
that re-interprets or Ignores certain types of vertices. {AFCRL-67-0133}]. 

False vertices do not belong to any object. 


Genuine and false vertices 


classification 


vertices 


categories "genuine" and "false"will allow isolation of objects in a 

picture; in fig. 'GENUINE', elimination of vertices F.', F ', and F ' 

12 3 

divides the genuine nodes of the network (see fig. ’NODES') into two 


non-connected components, and O , correctly separating the two bodes. 



Figure 'NODES' 

False vertices arise from the intersection of two 
projected edges, one of which Is typically occluded 
in part: by a face bordered by the other. Elimination 
of the false nodes Fj', Fj’and Fg' disconnects 
the network in two separate components, which are 
the bodies sought for. 

This suggests the following 

2-dlm body (first approx, to definition). Set of regions possessing 
only genuine vertices, and separated from other bodies 
by false vertices. 

In this way, the problem of Identifying bodies is equivalent to the 
problem of identifying genuine vertices, segregating the false ones. 
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Probl^"i« to be solved 


The computation of this equivalence is challenged 


by several problems: 

**■ The distribution and position of bodies may be such that false 
vertices look like genuine vertices (fig. 'CAUTION*). 



Fig. 'CAUTION' 

That vertex looks genuine, but is false. 


Global information (analysis of more than one vertex) is needed 
in general to distinguish them. In other words, although false 
vertices are those which separate two bodies, and 2-dim genuine 
vertices originate from 3-dim genuine vertices, to segregate 
them requires more than the simple analysis of their shape. 


Some genuine vertices look 


like false vertices. 



Genuine vertices of a body may not be present in the scene, or 



A single body may have totally disconnected sections (portions) 



Continuation is not clear; some doubts arise if the object in 
the foreground covers one or two bodies (fig. 'CONTINUATION'); 
the simplicity criterion prefers the single body interpretation. 
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Pig. 'CONTINUATION' 
Continuation Is not clear. 


In brief , difficulties are of two kindst 

" Genuine and false vertices can not be distinguished 
locally (see Theorem below). 

= Even when they are completely classified, problem of 
fig. 'CONTINUATION' remains. 

The solution of these problems will have to make use of more global 
information. 


Classification of Vertices The table •VERTICES' in next page classi¬ 
fies vertices according to their form, number of lines and angles 
among the lines. It contains the most common types; vertices having 
more edges could have been included. 

Let us consider one of these types, ARROW. Three regions called 
1, 2, and 3, form it. The standard, most common 
ARROW configuration is a body with faces 1 and 2 
seen same other object 3. We indicate 

this by [ (1 2) (3) ]. However all other configurations are possible: 

[ (1) (2) (3) ] 





[(1 2 


3)1 


48 






'ARROW'Three lines meeting at 
a point, with one of 
the angles bigger than 
180 degrees. 



'K'Two of the lines are 

collinear, and the other 
two fall on the same side 
of such lines. 




'T*Three concurrent lines, two 
of them collinear. 



'X'TVo of the lines are collinear, 
and the other two fall on 
opposite sides of such lines. 



'PEAK'.- Formed by four or more 'MULTI'.- Vertices formed by four or 

lines, when there is an more lines, and not falling 

angle bigger than 180°. in any of the preceding types. 

TABLE 'VERTICES' 

Classification of rectilinear vertices. 


1 
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Thus, for an ARROW, all the groupings of Its faces are possible; any 
procedure that, by looking at an Arrow tries to decide how its faces 
are grouped into bodies, will always make mistakes. 

The generalisation of the above analysis to all other types of 
vertices proves the following 

"Theorem". There does not exist a set of local decision procedures 

[Hj], each one looking or getting Information from one vertex 
and establishing b-equivalences among some of their faces 
(two faces a and b are b-equivalent, indicated asb, if 
the decides that they belong to the same body; this is 
an equivalence relation), using information only from that 
vertex (it does not look at the other vertices or at the values 
of the p's at the other vertices), which will partition all 
scenes correctly. 

That is, the following machine will not work for all scenes: 



RCsults 


The 

decide by processing information at exactly one vertex; 
the box in the right accepts all these decisions and passes them as 
results. No matter what set of we choose, there exists a scene 
that induces an Incorrect partition by our machine. 
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A stronger assertion Is that. In view of Inherent ambiguity, 
there is not even any global procedure! 

All the different groupings of regions of a vertex into bodies 
are possiblej this is illustrated by the following complete set of 
scenes, each one of them showing a different partitioning of a type 
of vertex. These examples are useful also in giving an idea of 
unusual, as well as familiar scenes] we will have later occasion to 
use them, when searching for heuristics to form bodies. 

Generation of partitions 


h-z. 


canpo ( ( 12 ) ) 

l ((1) (2)) 

z (( 12 )) 

2 


There are only two partitions of 
set of two elements. 



( 


Partitions 

elements 


of a set 



compo ( (123) ) 

< ((1) (2) (3)) 

z ((1 2) (3)) 

3 ((1 3) (2)) 

4 ((1) (2 3)) 

5 (d 2 3)) 

5 


Partitions of a set of 
elements 



compo ( (1234) ) 

1 ((1) (2) (3) (4)) 

2 ((1 2) (3) (4)) 

3 (ll 3) (2) (4)> 

4 ((1 4) (2) (3)) 

5 ((1) (2 3) (4)) 

6 ((1 2 3) (4)> 

7 ((1 4) (2 3)) 

* ((1) (2 4) (3)) 

4 ((1 2 4) (3)) 

■o ((1 3) (2 4)) 

M ((1) (2) (3 4)) 

iz ((1 2) (3 4)) 

<3 ((1 3 4) (2)) 

*+ (( 1) ( 2 3 4)) 

•5 ((1 2 3 4)) 

15 

t 

Figures in the next -f»w pages are 
numbered according to the numbers 
in the leftmost column in these 
tables. 
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Digression 1. An alternate approach 


Suggestion 

As an alternate approach, one could try to use the faces as a 
basis for identification. For instance, use two scenes (left image, 
right image) or pictures, localize a sharp feature in one of them 
(vertex, crack in the face, peculiar texture, etc.) and by correlation 
or some other method, find it also in the other picture. Having 
found a few points in both Images in this manner, determine the plane 
of the face, in 3-dim space. When several faces are thus identified, 
we can compute, if desired, their intersection and obtain the edges 
(lines). It will generally suffice to ignore the edges and rely on 
the faces. Since it is reasonable to expect considerable difficulty 
in finding lines and in differentiating lines caused by edges from 
those caused by shadows, an approach which avoids the lines altogether 
looks promising. But in this case, in addition to requiring two 
images, several correlations are needed (if we choose this method), 
a generally time-consuming and error-prone task. 
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SEE, A PROGRAM THAT FINDS BODIES IN A SCENE 


Synopsis 


How SEE works. 


Algorithms and heuristics are presented, implemented in a 
program, that analyze a scene into a composition of three-dimensional 
objects. Only the two-dimensional representation of the three- 
dimensional scene is available as input,and is described by a 
collection of surfaces, lines and vertices. 

SEE looks for three-dimensional objects in two-dimensional scenes. 
The program does not require a pre-conceived idea of the form of the 
objects which could appear in the scenes. It is only assumed that 
they will be solid objects formed by plane surfaces. Thus, SEE can 
not find “pentagonal prisms" or “houses" in a scene, since it does 
not know what a “pentagonal prism" is; but It will usually isolate 
the pentagonal prisms (or any other regular or irregular solid) in a 
scene, even if some of them are partially occluded, without having 
a description of such objects. It does this by paying attention 
to configuration of surfaces and lines which would make plausible 
three-dimensional solids, and in this way 'bodies' are identified. 

The analysis that SEE makes of the different scenes generally 
agrees with human opinion, although in some ambiguous cases they 
tend to be conservative. The most interesting thing about the 
program is how well it deals with occlusions. Many examples in 
the next section 'Analysis of many scenes' illustrate the features 
and peculiarities of the program, .and also illustrate the effects 
of inaccuracies introduced in the data. 
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INTRODUCTION 


Here is a program that locates objects in an optical image of a 
scene most likely composed by three-dimensional solids, perhaps 
occluding one to another, so that some of them may not be totally 
visible. We use a line drawing as our representation of the scene. 

The analysis of scene L10 (see figure 'L10' in next page) by 
our program, named SEE, produces 
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IS 
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Division of work in computer vision 

In trying to construct a program for seeing, several approaches 
are possible; most of them require some of the following set of 
modular programs or subroutines. 

Pre-processing. Converts the image from a 2-dim array of intensities 
to a symbolic representation or 'internal format' (page 66 ), in 
terms of vertices and lines connecting them. 

Homogeneity predicates . They decide if areas of the picture are 
inhomogeneous, and hence require further analysis (page Ifo). 

Color predicates . Boundaries of different color suggest lines. 

Line finder . Locates lines of points having certain property 
(such as being inhomogeneous, or having a large light intensity 
gradient). 

Vertex finder . Concurrent lines are merged, or a vertex is created 
at their meeting point. 

Consolidator . Eliminates the false lines and finds more lines, 
incrementing in this way as much as possible the reliability of the 
system. 
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Illumination program . Discovers where the main light sources are. 
Shadows program . Detects shadows so as to eliminate them. 

Missing lines program . General shape considerations suggest places 
where faint lines can remain undetected. 

Body recognition . Partitions the scene into appropiate subsets, each 
one being a body or object. Thus, SEE is a body-recognition program. 

Object identification . These objects are compared against abstract 
descriptions (models) of cubes, pyramids, etc., so ths t a classification 
is done, and a name is attached to each one. In the process, certain 
parameters may acquire values: the height of the pyramid is observed. 

Positioning . Having analyzed the scene, the relevant objects are 
positioned in three-dimensional space, and additional relations among 
them are discovered (support, obstruction, etc.). Enough information 
is obtained to allow the mechanical arm to manipulate the objects and 
achieve its goals. 

Stereo . More than one view are analyzed (page23S) and from them, 
3-dim spatial positions are found. 

Focussing . The computer, by adjusting the focus of its lens, 
acquires knowledge of how far the objects are. 

Feedback among these parts is more necessary as the complexity of the 
scene and of the desired goals increases. 

Recognizer . The task of body recognition and body identification was 
formerly accomplished by a single program (for instance, DT or TD (my 
MS Thesis}) that compares the symbolic description of the scene against 
the symbolic or abstract description of the model of the desired object, 
in a kind of two-dimensional matching, to isolate instances of that 
object in the scene. 

Technical descriptions of SEE 

1. Annotated listings . Above all, the primary source of information 
is the listing of the programs, that appears complete in this thesis. 

They are written in Lisp. If, despite my efforts, some of my explanations 
are not clear, consult it: it is annotated. The programs themselves, 
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examples, test data, results. Instructions, etc., are In the DEC- 
magnetic tape "GCZMAN F" at Project MAC (AI group). Instructions 
are given in page 78. 

2. This section of the thesis contains a description and discussion 
of the different algorithms and procedures used. 

3* Published papers that cover part of the material at somewhat 
less depth, and therefore are more readable, are also available 
{FJCC 68} {Pisa 68}. Except that they contain some examples not 
included here, they contain no other information not covered here. 

4. An internal report {MAC M 357} described an earlier version of SEE. 
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INPUT 


FORMAT 


Eventually, several preprocessors will be able to receive data 
through an input camera and reduce It to the "Internal format" of a 
scene. In the form required by SEE. For testing purposes, the scenes 
are entered by hand in a simplified format, called 'input format', 
to be described now. All the scenes analyzed by SEE have been written 
in input format. 


Example. R3 . The input format of scene R3 is 

(DEFPROP R3 (X*7> BACKGROUND) 


(NOT < SETQ R3 (JUOTE ( 


XA 

A.3 

4.5 

(X* 7 

XC 

XI4 

XC 

X*1 

XB) 

XB 

4.0 

5.7 

(X17 

XA 

X: 1 

XU 

) 


XC 

4.6 

6.5 

( X 5 4 

XF 

X > 2 

XU 

%*1 

XA) 

XU 

4.5 

9.15 

(X * 7 

XB 

X*1 

XC 

X*2 

XE) 

XE 

5.65 

9.25 

(X* 7 

XU 

X * 2 

XF 

) 


XF 

5.65 

8.6 

( X * 7 

XE 

X * 2 

XC 

XS4 

XG) 

XS 

6.6 

5.2 

(X 17 

XF 

X * 4 

XA) 



XH 

6.9 

15.4 

(X* 7 

xu 

X* 3 

XK 

XJ5 

XI ) 

XI 

8.5 

16.0 

(X * 7 

XH 

X*5 

XJ) 



XJ 

11.8 

12.6 

(X*7 

XI 

X * 5 

XK 

XI6 

XN) 

X* 

10.0 

11.9 

(X* 6 

XJ 

X*5 

XH 

X*3 

XM) 

XU 

7.1 

13.2 

(X • 7 

X* 

X * 3 

XH) 



XM 

10.0 

9.7 

(X * 7 

XN 

X * 6 

XK 

X*3 

XU) 

XN 

11.65 

10.3 

(X * 7 

XJ 

X*6 

XH) 




)))) 


IN INPUT FORMAT 


The first line declares :7 to be the background.* We have to 


tell SEE which regions belong to the background. If this informatior 
is missing, a program is called that will compute the regions that 
belong to the background (see section 'Background discrimination by 
computer') prior to other calculations. 

After that, the lines associate with each vertex its 2-dim coordi¬ 
nates and a list (which will later be called 'KIND'), in counterclock¬ 


wise order, of regions and vertices radiating from that vertex. 

The function PREPARA (see listing) converts the scene as just given 
to the "internal format" form which SEE expects. It does this by putting 


many properties in the property lists of the atoms representing vertices 
and regions (property lists in Lisp get explained in next page). 


♦For the moment, ignore the % signs. They are used to distinguish 
right from left scenes. 
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Property lists in Lisp * 


Each atomic expression In Lisp has a 
property list, which Is a place where facts can be stored. 

If It is desired to represent the fact that John Is a 69 years 
old male, has a wife called Jacqueline, and a height of value 1.77 m, 
we could proceed in Lisp as follows: 

(1) We will agree that the atom ’JOHN' will represent our man. 

(2) In the property list of 'JOHN* we will store several properties 
or indicators and their values, using the function PUTPROP, that 
stores information in the property list; thus 

(Putprop (quote John) (quote Jacqueline) (quote Wife)) 
will add, under the indicator or property 'Wife', the value 
'Jacqueline *: 

JOHN 

I 

WIFE - JACQUELINE 

(3) Hence, the representation of our facts in Lisp is 

JOHN 


SEX 

1 

— MALE 

AGE 

1 

— 69.0 

WIFE 

| 

— JACQUELINE 

HEIGHT 

(1,77 m) 


(4) In fact, the property list of 'JOffl', which is the CDR of 'JOHN* 
in Lisp 1.6 {MAC M 313}, is 

(SEE MALE AGE 69.0 WIFE JACQUELINE HEIGHT (1.77 m) ...) 

(5) If later we want to know the age of John, we will ask 

(Get (quote John) (quote Age)) 
and the value will be 69.0 


* im . 

This paragraph, which pan be skipped if it is known what a 
property list is, will make the next section clearer. 
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XJ 


PORNAT OP SCENE 
R3 

RECIONS 
VEST{CCS 

XA) 

•ACK6R0UN0 

U6 

NEICHOORt 

AVERTICES 

POOP 

MS 

NEICHtORS 
AVERT ICES 
POOP 

Ml 

NCI6HBQRS 
AVERT ICES 
POOP 

M2 

NEUHSONS 
AVERT ICES 
POOP 

Ml 

NCI6HO0RS 
AVERTICES 
POOP 

M4 


R3 


(IN SN Xk XA XJ XI XH 16 XP XE ID 1C XO 

im;> 

ll>9 Ml M7 M7I 
CXA XN XN XJ) 

(IMS XA 113 XN HP XN Xl7 XJ 11 

tX*3 Ml ISP X>P> 

(XA XJ XI XHI 

l(X«3 XA X'9 XJ XSP XI XtP XH11 

(Ml IIP Ml XS9I 
(XE. XN XA XHI 

(IXSP XE X«7 XH X»6 XA Sl» XHI) 

lit* XSP HP 1*11 
(XP XE 10 XCI 

(1X14 XF IIP XE UP XO XII XCI) 

(Ml 112 XSP XSP) 

(XC XO XS XA) 

(IXS4 XC XS2 XO XSP XO XSP XAII 


XCOR 

VCOR 

NVERT1CES 

NRE610NS 

AIMS 

type 

XI 

XCOR 

YCOR 

NVERTICES 
NRE6I0NS 
AINO 
TYPE 

IN 

XCOR 

V 60 M 

NVERTICES 

NRC610NS 

KINO 

TYPE 

xe 

XCOR 

VCOR 

NVCRT1CES 

NRE6IOHS 

KIND 

TYPE 



NEISHOORS 

IX«2 Ml XSP XSP) 

XP 



AVERTICES 

(XC XA XO XP) 


XCOR 


POOP 

lull «C Ml 1* M7 >8 117 XFl) 


YCOR 

XSP 




NVERTICES 


NElBHSORS 



NREtlONS 

XU 

X»ll 



AIN* 


AVERT ICES 

(XN XH XL XH XI XJ 1« XP It XA XO XD) 


TYPE 


POOP 


XE 


1 XO)) 



YCOR 

IN 




NVERTICES 


XCOR 

U.049999 


NREC1DNS 


VCOR 

10.209999 


KINO 


NVERTICES 

(XJ XH) 


TYPE 


MAESIONS 

(UP XSO) 

XD 



RIND 

(UP XJ Ml XH) 


XCOR 


TYPE 

(k IMI XSP)) 


YCOR 

XH 




NVERTICES 


XCOR 

10.0 


NRE010NS 


YCOR 

9.7000000 


KIND 


NVERTICES 

(XN XA XL) 


TYPE 


NAE610NS 

(UP XSO XS3) 

XC 



RIND 

(XSP XN SsO XK Xs3 XL) 


ACOR 


TYPE 

(ARROH (XA XH XN XL Ml XS3 U7I) 


YCOR 

Xk 




NVERTICES 


XCOR 

P.1000000 


NRE6I0NS 


YCOR 

13.200000 


KIND 


NVERTICES 

(XH XH) 


TYPE 


NREtlONt 

(UP U3I 

XO 



KINS 

(XSP XH X«3 XH) 


XCOR 


TYPE 

IL (HI UPl) 


YCOR 

X« 




NVMT1CE1 





NRCOIONS 


XCOR 

10.0 




YCOR 

11.009900 


type 


NVERTICES 

(XJ XH XH) 




NRCOIONO 



XCOR 


AINO 

IMS XJ Ul XH XS3 XN) 


VCOR 


TYPE 

(FORA XA) 


NVERTICES 


NRCOIONS 

KINO 


TABLE 

R 3 IN INTERNAL FORMAT 


u«7»tm 


(XI XA XN) 

(XSP ISO US) 

(XSP XI XlO SA SSO XN) 

I AARON (XA XJ XI SN ISO X*§ UP 11 


*.» 

(XH XJ) 

(UP xsoi 

(Stp XH SSO XJ) 

Ik (Ml ssp)l 


I.IMMM 

IS.9MMS 

(Ik IK XI) 

(UP U3 ISO) 

(up ak xsa xk its xi) 

(AARON (XK XH Xk XI 1«J US XSP)) 

0.6000000 
9.1909000 
(XP XA) 

(UP Mil 
(UP ftp XI4 XA) 

(k ISIS XSP)) 

9.0909000 
0.0000000 
(XE XC SO) 

(UP M2 Ml) 

(XlP BE lit XC Xt4 XO) 

(T (XC XP SO xe 112 ISA XSP)) 

9.0400009 
0.29 
(XO XP) 

(UP 1S2) 

(UP XO Ml XP) 

Ik (XS2 UP)) 


4.0 

0.1400000 
(XO XC XE) 

(XSP SSI xst) 

(XSP 10 XU XC Xl2 XE) 

(ARRQM ISC XO XO XE SSI xst XSP)) 


o.s 

(XP XO XA1 

(XSA XSt BSD 

1X14 XP S|2 XD XU SA) 

IPORR XCI 


4.0 

9.0900009 

(XA XO) 

(UP BS1) 

(XSP XA XS| XO) 
<k (lit XSP)) 


4.9 

(XO SC SOI 
(BSP XS4 XS|) 

(SsP Xi ISA XC XU XO) 
IAAAQN (SC XA XO SO X<4 


x*l SSP)) 


65 










INTERNAL 


FORMAT 


The program assumes the scene in a special symbolic format, 
which basically, is an arrangement of relations between vertices and 
regions, which are represented by atoms having adequate properties 
in their property-lists. 

A scene has a name which identifies it; this name is an atom 
whose property list contains the properties 'REGIONS', 'VERTICES', 
and 'BACKGROUND'. For example, the scene R3 (see figure R3) has the 
name 'R3'. In the property list of R3 we find (see also table*R3 lU 
INTERNAL FORMAT”) 

X*2 X * 1 X14 X*7) 

Unordered list of regions 
composing the scene R3. Ordtra 

XJ XI XH XU XF X£ XD XC Xb /.A) 

Unordered list of vertices 
composing the scene R3. 


Unordered list of regions 
composing the background of 
scene R3. 

surface limited by simple closed curves. 
Regions are represented by atoms that start with a colon (;). For instance, 
in R3, the surface delimited by the vertices K J N M is a region, 
called :6, but D E F G A C is not. 


REGlGNS (X * 6 X*5 X*3 

VERTICES (XN XM XL XK 

BACKGROUND (x:7) 

Region . . 

~ A region corresponds to a 


Each region has as name an atom which possess additional proper¬ 
ties describing different attibutes of the region in question. These 
are 'NEIGHBORS', 'KVERTICES', and 'FOOP'. For example, the region in 
scene R3 formed by the lines DE, EF, FC, CD has ':2' as its name. 

In the property list of :2 we find: 


NEIGHBORS 


(X*4 Xl7 X * 7 


X* 1 ) 

Counterclockwise ordered list of 
all regions which are neighbors to 
:2. For each region, this list is 
unique up to cyclic permutation. 
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KVERTICES <XF XE XD XC) 

Counterclockwise ordered list of 
all vertices which belong to 
region :2. This list is unique 
up to cyclic permutation. 

POOR ((X*4 XF X*7 XE Xt7 XD X*1 XC)) 

Each sublist is a counterclockwise 
ordered list of alternating 
neighbors and kvertices of :2. 

Each sublist is unique up to cyclic 
permutation, and indicates a 
simple boundary. 


Each sublist of the FOOP property of a region is formed by a 
man who walks on its boundary always having this region to his left, 
and takes note of the regions to his right and of the vertices which 


he finds in his way. 


As other example, in the property list of :7 we find: 


neighbors 
xu xm 

kvertices 

POOR 

xJ> 


(X»6 X > 6 X* 3 X13 XI 5 X»b X«2 X«2 X*4 X«4 

(XN XM XL XH XI XJ XE XP XG XA XB XD) 

!(X*6 XN X*6 XM X*3 XL X»3 XM X*5 XI X*5 
(X*2 %E X*2 %F %U x« X** XA X*i xB X*1 *&>> 


VgftGX 

- A vertex is the point where two or more lines of the scene 
meet; for instance, A, G, and K are vertices of the scene R3. Each 
vertex has as name an atom which possess additional properties des-*- 
cribing different attributes of the vertex in question. These are 
'XCOR', 'YCOR' , 'NVERTICES', 'NREGIONS', 'KIND', 'TYPE', and 'NEXTE'. 
For example, vertex J (see scene R3) has in its property list: 


XCOR 

ycor 


11.799999 
12.600000 


x-coordinate 


NVERTICES (XI XK XN) 


y-coordinate 


Counterclockwise ordered list of 
vertices to which J is connected 
Unique up to cyclic permutation. 
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NREGIONS ( X t 7 X*5 X*6) 

Counterclockwise ordered list of 
regions to which J is connected. 
Unique up to cyclic permutation. 

KINO (Xl7 XI X|5 XK Xs6 XN) 

Counterclockwise ordered list of 
alternating nregions and nvertices 
of J. this list is unique up to 
cyclic permutation. 

TfPE (ARROW (XK XJ XI XN X*5 XU X*7>> 

List of two elements; the first is 
an atom indicating the type-name 
of J; the second is the datum of J. 
To be explained in next section. 

(NEXTE) Vertex J does not have the indica¬ 

tor NEXTE in its property list. 

The KIND property of a vertex is formed by a man who stands at 
the vertex and, while rotating counterclockwise, takes note of the 
regions and vertices which he sees. NREGIONS and NVERTICES are then 
easily derived from KIND, by taking its odd positioned elements, and 
its even positioned elements, respectively. 

NEXTE is a property that appears in certain vertices (non* in 
scene R3); it will be explained in next section. 

The property TYPE is also put by the function PREPARA; it classi¬ 
fies each vertex into one of several types, as described in table 

'VERTICES' (next page). 
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'FORK’.- Three lines forming angles 
smaller than 180 degrees. 




'ARROW'Three lines meeting at 
a point, with one of 
the angles bigger than 
180 degrees. 



'K'.- Two of the limes are 

colllnear, and the other 
two fall on the same side 
of such lines. 



'T'.- Three concurrent lines, two 
of them collinear. 



'X'.- TWo of the lines are collinear, 
and the other two fall on 
opposite sides of such lines. 


'PEAK'.- Formed by four or more 'MULTI'.- Vertices formed by four or 

lines, when there is an more lines, and not falling 

angle bigger than 180°. in any of the preceding types. 

TABLE 'VERTICES' 

Claesificetion of rectilinear vertices. 
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TYPES OF VERTICES 


The disposition, slope and number of lines which form a vertex 
are used to classified it, task performed by the function 
(TYPEGENERATOR L) by storing in its property list its corresponding 
type. 

The TYPE of a vertex is always a list of two elements; the first 
is the type-name : one of 'L\ 'FORK', 'ARROW', 'T', 'K' , 'X', 'PEAK', 
'MULTI'; the second element is the datum , which generally is a list, 
whose form varies with the type-name and contains information in a 
determined order about the vertex in question (see table 'VERTICES'). 

Vertices where two lines meet. 


L. - A vertex formed by only two lines is always classified as of type 'L'. 
Two angles exist at it, one bigger and other smaller than 180°. The 
datum is a list of the form 

(Ei E 2 ). where Ei is the region which contains 
the angle smaller than 180°. 

E^ is the region which contains 
the angle greater than 180°. 

For instance, in scene R3 (see fig. 'R3'). 

G has in its property list: 

TYPE (L (%: 4 %:7)) 

The vertices of type L present in R3 
are B, E, G, I, L, N. 



Vertices where three lines meet. 


FORK. - Three lines meeting at a point and forming angles smaller than 
180° form a FORK. 
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Its datum is the vertex itself 
at which the fork occurs. For instance, 
vertex Khas in its property list 
TYPE (FORK %K) 

The vertices of type FORK present 
in R3 are C, K. 



ARROW. - Three lines meeting at a point, with one of the angles bigger 
than 180°. 

The datum of an ARROW is a list like 
(Ej E^ Eg E^ Eg E^ E^) where 

Ej is the vertex at the 'tail 1 , 

E^ is the vertex at the center. 

Eg is the vertex at the left of Ej-^-E^ 

E^ is the vertex at the right. 

Eg is the region at the left. 

E^ is the region at the right. 

E ? is the region which contains the angle bigger than 180°. 

For instance, vertex H has in its property list 

TYPE (ARROW (%K %H %L %I %:3 %:$ %:7)) --fig R3 

The vertices of type ARROW present in R3 are A, D, H, J, M. 
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Eg is the region which contains the angle smaller than 90 degrees. 
E,, is the "central "region (where the 180° angle is). 

For instance, vertex F (fig. R3) has in its property list 
TYPE (T (%C %F %G %E %:2 %:4 %:7) ) 

The vertices of type T present in R3 are F only. 

See also "Matching T 1 s or Nextes "below. 

Vertices where four lines meet. 

K.- When two of the lines are collinear, and the other two fall in the 
same side of such lines, The datum is a list of the form 
(Ei E 2 E 3 E 4 E 5 E 6 E ? Eg) where 
E i is the central region. 

Eg is the region having the 180° angle. 

Eg is the collinear vertex which falls 

to the left of E. E->. c 

12 fc -2 

E^ is the region to the left of Ej—»-E 2 

E c is the vertex to the left of E.-^E, 
d 1 Z 

Eg is the collinear vertex which falls to the right. 

Ey is the region to the right of Ej-^-Eg. 

Eg is the other vertex to the right (of E^. E 3 

R3 contains no vertices of type K. PA of figure BRIDGE is of type 'K'. 
X. - When two of the lines are collinear, and the other two 
fall in opposite sides of such lines. The datum is a list of the form 
(Ei E, Eg E 4 E 5 Eg), where 

Ei is one of the collinear vertices. 

E 2 is the region to the left of Ej C, 
where C is the vertex at the center. 

Eg is the region to the right of Ei C, 

E 4 is the other collinear vertex. 

Eg is the region to the left of E 4 C. 

Eg is the region to the right of E 4 C. 
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For instance, we find in the property list of F 
(figure BRIDGE) : 

TYPE (X (QA:26 : 22 G : 21 :30) ) 

'The vertices of type X present in BRIDGE are F, only. 

The datum for an X may also be in the form (E 4 E & E^ Ej E 2 E 3 ). 
Vertices of four lines which are not of type K or X are either of 
type PEAK or MULTI. 


Other types of vertices. 



MULTI. - Vertices formed by four or more lines, and not falling in any 
of the preceding types, belong to the type MULTI. R3 contains 


no PEAKS or MULTIS. 

The datum for vertices of type PEAK is of the form (E^ E^ E^), where 
E 2 is the region that contains the angle bigger than 180 degrees; 

Ej is the vertex before E 2> and E 3 is after (in the £ sense). 

The datum for vertices of type MULTI is of the form Ej, where 
Ej is the vertex itself. 

NEXTEs or Matching T 1 s.Two T 1 s which are collinear and facing each other 
(see figure) are called "matching T's, and each one is the "nexte " 
of the other. The indicator "NEXTE "is placed in such vertices. 

If the region E ? of a T (see figure) is the background, that 
T can not be a matching T. 
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In the figure, E„ and F„ are matching T's because E,-E n is 
colinear with Fj-F^ It is not required of E^E^ to be parallel, to 
F 3 -F 4‘ If several pairs of T's are possible, the closest is chosen: 




P - Q are matching T's, 
and not P - R. 


The matching T’s will get involved in the determination of places 

where a body is occluded by another object and later emerges visible 
again. 








For two T's to be NEXTEs or matching T's, it is required that 
neither nor be background.T*c requirement should be extended to 
all regions between E ? and F^, since alios can not go "under" the 
background region: 




A and B can not be NEXTEs, since :11 is.the background. 

Two straight lines always intersect (possibly at t suggestion) 

infinity); a way to detect these background regions F* ]k. 
is to write functions (subroutines) that find eht if two segtents of 
line intersect, or if one segment intersects with a line. 



LINES AND SEGMENTS 

In the plane, two straight lines always meet. 
Two segments, or a line and a segment, may or 
may not meet. (*■ *« «. fsui* ftrtc* »f * !<■«')■ 









FIGURE 'TOWER' 
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FIGURE 'M 0 M 0‘ 
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THE 


PROGRAM 


We now describe SEE, and how it achieves its goals, by discussing 
the procedures, heuristics, etc., employed and the way they work. 

We begin with several examples. 

Example A. Scene 'TOWER'. This scene (see figure 'TOWER') is 
analyzed by SEE, with the following results; 

wtSUL fS 

<BUUY 1. is *2 13 *1) 

(auuy 2 . is ns 15 14 ) 

(SOUY 3. IS 123 *17) 

(bOhY 4. IS *6 *7 *8) D , 

IdOuY 5. IS *10 *11 39 ) Results for scene TOWER 

IdODY 6, IS *13 *14 *12) 

(SUbY 7. IS t1d *22) 

(BODY 8. is *20 319 *21) 

Example B. Scene 'MOMO*. Details of the program's operation are 
given, (skip to next page, if you wish). 

tz $L SEE lp Go to DDT and load file SEE 1 (in tape 

GUZMAN F), a binary dump of the program 
SEE • 

$ G Start. 

(UREAD MOMO SI 3) tQ Read the file MOMO SI (in tape GUZMAN C) 

from tape drive 3. 

(PREPARA MOMO) Convert MOMO from its Input Format form 

to Internal Format, the proper form that 
SEE expects. 

(SEE (QUOTE MOMO)) Call SEE to work on MOMO. 

Results appear in next page. 

Notes: tZ (control Z) is keyed by striking the Z key while holding 
down simultaneously the CONTROL key. (Memcx *1,67,1#) 

denotes carriage return. 

$ denotes the character "alt. mode". (stt ats* ia|mk«» 
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SE£ 58 ANALYZES HOMO 
EVIDENCE 
LOCALEV1DENCE 
TRlANS 


GLOBAL 

((ML) ((*38) SO 044 G0043 &0041 G004Q) l(*19) S0046 GOOa 5 SOetc. 

local 

(LOCAL ASSUMES < * 1/> (*9> SAME BODY) 

(LOCAL ASSUMES (*9 *17) (*18) SAME BODY) 

((NIL) (NIL) ((*6)) (NIL) (NIL) (Mu) ((*38 *37 *39) G0043 tkc- 

LOCAL 4 . 

(((*3 *2 *1) G0 081 G0029 &0030 G0028) ((*32 *33 *27 *20) &0e«~ 


LOCAL 

SMB 

RESULTS 

(BODY 1. IS <3 (2 *1) 

(BODY 2. IS *32 *33 *27 *26) 

(BODY 3. IS *26 *31) 

(Body 4* is *20 *34 *19 *30 *29) results for homo 

(BODY 5. IS *36 *35) 

(BODY 6. IS *24 *5 *21 *4) 

(BODY 7, IS *25 *23 *22) 

(BODY 6. 1$ <14 *13 *15) 

(BODY 9. IS *10 *16 *11 *12) 

(BODY 10. IS *17 *16 *9) 

(BOOT 11. IS *7 *6) 

(BODY 12. 18 *38 *37 *39) 

NIL 


Most of the scenes contain several "nasty" coincidences: a vertex of 
an object lies precisely on the edge of another object; two nearly 
parallel lines ere merged into a single one, etc. This has been 

done on purpose, since a non-sophisticated pre-processor will tend to 
make this kind of error^ 

Example C. R3. Analysis by SEE gives 
(BODY 1. IS X(2 XII XI4) 

(BODY 2. IS X|6 X*5 X*3) RESULTS FOR 'R3' 

The % sign indicates the dextral scenes (cf. page ^33 ), The signs 
may be ignored. 
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The Parts of SEE 


The program is straightforward; it does not call 
itself recursively; it does not do "pattern matching"; it does not do 

tree search. It is formed by several main parts, sequentially execu 
ted. They are 

LINKS FORMATION. An analysis is made of vertices, regions and asso¬ 
ciated information, in search of clues that indicate that two 
regions form part of the same body. If evidence exists that 
two regions in fact belong to the same body, they are linked 
or marked with a "gensym" (both receive the same new label).* 
There are two kinds of links, called strong (global) or weak 
(local). 

Some features of the scene will weakly suggest that a group 
of regions should be considered together, as part of the same 
body. This part of the program is that which produces the 
'local' links or evidences. 

NUCLEI CONSOLIDATION. The 'strong' links gathered so far are ana¬ 
lyzed; regions are grouped into "nuclei" of bodies, which grow 
until some conditions fail to be satisfied (a detailed explana¬ 
tion follows later). 

Weak evidence is taken into account for deciding which of 
the unsatisfactory global links should be considered Satisfac¬ 
tory, and the corresponding nuclei of bodies are then joined to 
form a single and bigger nucleus. 

BODY RETOUCHING. If a single region does not belong to a larger 

nucleus, but is linked by one strong evidence to another region, 
it is incorporated into the nucleus of that other region. If 
necessary, more nuclei consolidation could be done after this 
step.. 

A last attempt is done to associate the remaining single 
regions to other bodies. 

The regions belonging to the background are screened out, and the 
results are printed. 

* In LISP, a "gensym "(generated symbol) is a new Atomic symbol, 
previously unused. 
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Three functions are used constantly, and Will be described now. 


THRDUGHTES 


"Throu^va chain of T's." Allows properties or configu¬ 


rations to extend along straight lines; for instance', the property 
<<'A' has as neighbor an L» "• ~J can be extended so as 
to say<£throughtes, 'A* has as neighbor ax L VX 



Strict definition. *— ^ ■— i« defibbi as one of 

(1) • (meaning the two vertices in both sides of -ft-are in 

fact the same). 

( 2 ) 

(3) 

(4) 

Example a , , \ ? See also annotations on listing. 

of —A/—7—-* 

If a vertex V is cohsidered a "good T", (OOOOT V) Is TRUE; 
false otherwise. 

(GOOHT V) - F if V is not a "I" 

F if 

X if V has a NEXFB. 

F if 

F if 

T otherwise. 
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Nosabo tries to find conditions indicating that two regions should 
not be considered as part of the same body} hence, if consulted, 
Nosabo may forbid a link among them. Some heuristics place links 
without asking Nosabo's approval and Nosabo can not "erase "a link 
placed without its authorization. 

If none of conditions (1) to (5) is met, Nosabo will be False, 
indicating no inhibition was found, and it is up to the program that 
asked Nosabo's opinion to lay or fail to lay the link in question. 
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We proceed now to explain in considerable detail each of the parts 
of SEE. This will help the reader to understand the behavior of 
the program, its strengths and deficiencies. 


LINK FORMATION 


Several subroutines are devoted to creating weak and strong 
links. See also Lasting. 


CLEAN 


Removes several unwanted properties. 


EVERTICES Each vertex is considered under the following rules: 


L _ No evidence is ereetUj directly by this type of vertex, 

r Nevertheless, the "L" is used in many combinations 
with other vertices to account for evidence. As we 
saw, Nosabo uses L’s. "Legs" will use them, too. 


FORK.- 


r 



No link iscr«M if any of the three regions is 
background (but see below). 

Example (unless otherwise indicated, all examples 
are from figure 'BRIDGE' page 94) : Vertex J 
does not generate links. 

Otherwise, three links are crutd as shown, except 
that each one may be inhibited by Nosabo. 

Example. Vertex JB only produces link :5-:8. 

Link :5—:9 is inhibited because S is a 'T'; Nosabo 
also forbids link :8—:9 because KB is an arrow • 
This last rule is the most powerful of the heuristics. 
Two links aretreaW as shown, without asking Nosabo, 
if the fork is connected to the central line of 
an arrow. I'"* F ^ here ^ 

Example: In fig. R19, PA generates links :29-:17 


and :35-:17. 

Thisiast heuristic is of help where there are concave objects (Fig. R19). 
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ARROW.- 



' Link if an L is connected to its central line, 
and the region shaded contains only that arrow 
as a "proper-arrow," and no Forks. 

Region :1 contains arrow A :J> 

as a "proper-arrow"; also / 

region :2, but not region :3. Capisce? 
Example. RB links:10 with :4. 

Allows "lateral faces" of legs to be properly 
identified and agglutinated. 

Otherwise, link except if inhibited by Nosabo. 
Example. D lays a link between :26 and :23. 
Powerful and general heuristic. 

No link if the X comes from the intersection 
of two lines. 


Otherwise, link as shown except if Nosabo disagrees. 
Example. G originates links :26-:22 and :21-:30; 
this lest one will later be erased or disregarded, 
since :30 is the background. 

No link. 


PEAK.- 



Linke are established between contiguous regions, 
except those to Jthe region containing the angle 
bigger than 180 °. These links are subject to 
Nosabo inhibition. 

Example, Ip fig. 'CORN', JJ generates links 
:8-:9 and :9-:10. 

Of certain use, specially with pyramids and 
"pointy" objects. 



No link. 

The reason is: 

(1) if the vertex is "genuine" (cf. VV), 
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T.- 




although it generates no links, the object 
having it will probably possess many 
other vertices, through which links 
will get established, and 
(2) if the vertex if "false" because is the 
result of the casual coincidence of two 
or more genuine vertices, mistakes are 
avoided by abstaining of links. 

This is generally the case. 

An improvement is possi- JjsUGGESTIOnJ 
ble, by allowing MULTI 
vertices to place links. 

If matching T's, link as shown, without consulting 
Nosabo, Avoid linking to the background. 

Each pair of matching T's produces these links 
only once; that is, we do not produce two links 
while analyzing A and another two at B. 

Do not link if the middle region of a 'T' is the 
background. 

What we are trying to do here is to find places 
where a body appears as two disconnected parts. 




Link (without Nosabo's consent) as shown if the 
central segment of the *T' separates two non¬ 
background regions, and these have the background 
as neighbor, and part of the separations between 
background and no-background are parallel to the 
central segment of the 'T'. 

Avoid double links in the following case (link 
just once): 
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Example. TA links :21 with :27 (F-G, 

RA-TA and JA-IA are parallel). 

Favors occluded bodies with parallel faces. 
Also, see ''STUDY" in listing, still an 
experimental feature. 

Two links are placed as shown (without asking 
Nosabo) if the central line of the T is 
connected to the central line of an arrow. 

It is of help where there are concave objects. 


Table 'Global Evidence* shows compactly the main rules just discussed. 


LOCALEVIDENCE 

. ■■■ ■ Weak or local links are laid here; they are used to 

indicate, in a feebler way, that two faces or regions may be part of 
the same object. 

Nosabo can not inhibit local links. 


LEG.- , 


m - A weak link is placed as shown (dotted) if, 
Throughtes, an L is connected to an Arrow, 
and the two indicated edges are parallel. 

We call this configuration 'Leg'. 
Example (all examples from figure 'BRIDGE', 
except if counterindicated). Vertex FA is 
a Leg (FA - QB is parallel to EA - DA) 
that links weakly :18 with :19. 



In a Leg, if there are two matching T's as 
shown, a weak link is placed correspondingly. 
Example. In fig. 'TRIAL' (page 88 ), a weak 
link or evidence is placed between :7 and :4, 
because EE is a Leg, and L and E are T's. 


86 











The heuristics described will s one times produce a "wrong linkage, " 
linking two regions that do not belong to the same body. These mistakes 
are not likely to confuse SEE, since the handling of these links (and 
all of SEE, in general) is done under the atsumptlon or knowledge that 
the information is noisy and somewhat unreliable. 

Strong links are shown dotted; weak links are not shown. 




TABLE 'GLOBAL EVIDENCE' 



(F) 



(I) 
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TRIANGLE 



A Triangle Is a 3-vertex region, of which 
two are interconnected T's, the type of the 
other vertex being Irrelevant. 

Two triangles are weakly linked If they are 

(1) 'facing each other! and 

(2) "properly contained", meaning that D has 
to fall on the same side of AB as C does, 
and similarly for the other vertices, and 

(3) AB Is parallel to EF, and AC to DE. 

The heuristic helps with faces of a prism 
that Is badly obscured. It does not help 
much, since it gives only a weak link. On 
the other hand, this weakness prevents mis¬ 
takes when the two triangles are not from 


the same body. 

A possible Improvement 




88 






11 





FIGURE 'TRIAL' 
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The links could be represented as 



Figure 'TRIAL - LINKS' 

Strong (solid) and weak (broken lines) 
links of figure 'TRIAL'. 


SEE prints these links in the following way: (cf. also p. 110): 


:11 has four links emanating from itself. 


((ML) ((ill) GOO 14 G0013 £0011 £ 0010 ) ( 
(* 12 • GOO 15 GOO 14 G0C13 G0D12) ((*13) CO 
021) <(*9) G0 0 2 2 G0021 £0020 G0019 G0017 
GQ016) (1*10) GQ015 G0C12 GOOll. G0010) 
(1*3) G0034 £0025 G0C24) ((*4) GUU33 GOO 
32 G0CZ6 £ u 0 ? 5 GO023) ( 1 * 6 ) £0031 G0030 
G0029 G0027) (1*5) GCQ26 GU023 GC022 GOO 
15 GOO 17 ) l ( *7 ) Uli'Jii GC032 £0019 G0016 
G0016) ii*8) £0J3« 60020) (1*2) GO 

035 G0C31 G0029 G0028) ((*14)) ((*1) GOO 
35 G0030 G0028 G0027)) 


Strong Links of TRIAL 


Weak links of scene 'TRIAL' are 


((*2 *1) ( * b *2) 1*6 *1) 1*4 *5) ( * 9 *5) „ , ^ . , , 

( 113 *9) i*3 *8) (*9 *8) ( * 4 *7) (*9 *7 Weak llnks of TRIAL. 
) ( *12 * 10 ) (*11 * 12 ) ) 

^-There is a weak link between *12 and llO 
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The next step is to gather all this evidence and to form tentative 
hypotheses of objects as assemblages of faces with many links among 
them. 


NUCLEI CONSOLIDATION 

All the links to the background are deleted, since it can not 
be part of any body. 

Strong and weak links exist among the different regions of a 
scene. They are consolidated in that order by two subroutines, 
Global and Local. 


GL0BAB Groups of faces with an abundance of strong links among them 
are first found; these "nuclei" will later compete for other faces 
more loosely linked. 


Definition i a nucleus (of a body) is either a region or a set of 
regions that has been formed by the following rule. 

Rule: If two nuclei are connected by two or more strong links, 
they are merged into a larger nucleus. 

More detailed rules appear in page 2S , in section 'Simplified 


view of Scene Analysis'. 


For instance, in the figure below, regions :1 and :2 are put 



together, because there exist two links among them, to form nucleus 
;l-2. Now we see that region :3 has two links with this nucleus :l-2, 
and therefore the new nucleus :l-2-3 is formed. 

We let the nuclei grow and merge under the former rule, until 
no new nuclei can be formed. 
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When this is the case, the scene has been partitioned into 
several "maximal" nuclei; between any two of these there is at most 
one link. For example, figure 'TRIAL-LINKS' will be transformed into 
figure 'TRIAL-NUCLEI'. 



Figure 'TRIAL - NUCLEI' 
Maximal nuclei of scene TRIAL. 


" If some strong link joining two "maximal" nuclei is also 
reinforced by a weak link, these nuclei are merged. 

The weak links of figure TRIAL are shown as dotted lines in 
figure 'TRIAL-LINKS' (page 90); they transform figure 'TRIAL-NUCLEI' 



suggested by local links. 


BODY RETOUCHING 

Additional heuristics assign unsatisfactory faces to existing 
nuclei, or isolate them. SINGLEBODY and SMB are used for this task. 
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SINGLEBODY 


A strong link joining a nucleus and another nucleus composed 
by a single region is considered enough evidence to merge the nuclei in 
question if there is no other link emanating from the single region. A 
message is printed indicating these merges. 

Such rules produce no change in fig. 'TRIAL-FINAL', and there¬ 
fore its nuclei will be reported as bodies. 

A more complex example shows the retouching operation. Figure 
'BRIDGE' undergoes these transformations: 


W 

w 

ra 

X 

< 

os 

o 

o 

oi 

Ph 

w 

W 

h 

O 

H 

Oi 

■< 

X 

o 

S 

o 

•4 

fa 


Scene BRIDGE 



Weak and strong links among regions 



Maximal nuclei 
(2 or more strong links) 



Maximal nuclei enlarged 



by single undisputed regions 



Id. enlarged 

by good neighbors, "goodpal". 
Final result. 



Fig. BRIDGE 


Fig. 'LINKS-BRIDGE' 


Fig. 'NUCLEI-BRIDGE' 


Fig. 'NEW-NUCLEI-BRIDGE' 


Fig. 'FINAL-BRIDGE' 


Fig. 'FINAL-BRIDGE' 
(no change in this 
case). 
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We see that in figure 'NEW-NUCLEI-BRIDGE', nucleus :16 is merged 
by SINGLEBODY with nucleus :18-19 (see figure 'FINAL-BRIDGE'). Nucleus 
:28-29 is not joined with :26-22-23 or with :24-25-27-12-21-9. Even if 
nucleus :28-29 were composed by a single region, still will not be 
merged, since two links emerge from it: two nuclei claim its possession. 

This rule joins single regions having only one possible "owner" 
nucleus. 

Slffi 

— Two systems of links are used by SEE. One consists of weak and 
strong links, produced by examining each vertex, and culminates forming 
nuclei under GLOBAL, LOCAL, etc. 

The second system constitutes a different network of links; SMB 
works in the second system. It is motivated by the desire to collect 
evidence not directly available through the vertices. It gathers 
evidence from the lines or boundaries separating two regions, in an 
effort to answer the question: Are two given neighboring regions part 
of the same object, or are not they? That is, are two contiguous regions 
"good neighbors" ("good’pals")? If they are, a special link, s-link , 
is placed, eventually forming a network independent of weak and strong 
links, that will collapse, in a somewhat peculiar way. Thus, a great 
amount of unnecessary duplication could be possible in the information 
carried by both systems of links. To reduce it, the s-links are designed 
to complement and extend, rather than to re-do, the agglutination 
produced by weak+strong links. They (the s-links) will, therefore, mainly 
study single faces not satisfactorily accounted for. 

SMB uses the predicate (GOODPAL R S), which acquires the value T 
(true) if R and S are two contiguous "good neighbors" regions. 

To satisfy this, their common boundary must not be empty, and must 
lack L's, FORKS, ARROWS, K's, X's, PEAKs, MULTIs. In addition: 

Not good: (GOODPAL R S) = F 
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=« 0. K. otherwise: (GOODPAL R S) = T. 

In particular, 

/—*-A 

is 0. K. if (NOSABO R S) = F. 


SMB analyzes the nuclei formed under weak+strong links that, after 
SINGLEBODY actuation, still remain formed by a single face or 
region. The steps are: 

1. A network of s-links is formed by putting a s•link between regions 

forming a nucleus all by themselves, and their goodpal neighbors. 

2. If exactly one nucleus is s-linked to one of those regions (that 

is to say, if such single-region single-nucleus has precisely 

one good-pal), the region gets absorbed by the nucleus; otherwise 

the region is reported as a body in itself (consisting of a single region) 



does not change because :3 has two s-links. 

Note that 

a. The s-links are not used to form nuclei as the weak+strong links 

were; they only help certain isolated faces to join bigger 
structures. 

b. Two s-links between two regions have the effect of ©ne. 

Example. In figure 'HARD', regions :6 and :7 get joined by SJffi. 
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FIGUBE 


SEE 56 ANALYZES HARD 
EVIDENCE 

localevicence 

TR1AN6 

global 

(<NIL> ((*34)1 ((*6)) ((*36)) ((*24) 60026 60025 60023 60«tc. 
0044 60043 60042) ((*17) 60047 G0046 60045 60044) t(*7))«te 
0041 60039) ((*21> 60050 60040 60039 60029 60026 60027) ( 
0036 60036 60019) ((*26) 60054 60053 60037 60036) ((*27) -• 
60055 60023 60020 60015) ((*32) 60057 60056 60034 60033) 

6 60046) ((*4) 60056 60046) ((*10) 60059 60032 60031) ((* 
*19) 60064 60063 60062 60061) ((*20) 60064 60062 60060 60. 
*30) 60056 60035 60033 60016) ((*15) 60066) ((*16) 60066) 
((NIL) U*34)) ((*6)) ((*36)) (NIL) (NIL) (NIL) (NIL) ((** 
019 60053 60036 60054 60036 60037 60019) (NIL) ((*24 *22 
0040 60039 60029 60026 60027 60024 60022 60055 60023 6002 
) (NIL) ((*5 *4) 60046 60058 60046) (NIL) ((*13 *17 *14) i 
*18 *19 *20) 60060 60064 60063 G0061 60064 60062 60060 GO 
*32 *31 *30) 60033 5.0057 60034 G0056 60035 60033 60016) ( 

LOCAL 

(LOCAL ASSUMES (*11) (*12) SAME BODY) 

(LOCAL ASSUMES (*15) (*l6) SAME BODY) 

((NIL) ((*34)) ((*6)) ((*36)) (NIL) (NIL) ((*7)) (NIL) (N 
019) ((*24 *22 *3 *23 *21 *28 *29) 60020 60026 60025 6004* 
0055 60023 60020 60015) 11*1 *2 *33) 60052 60051 60017 GO* 

43 60047 60046 60044 60047 60045 60043 50042) (NIL) ((*16 
*10 *8) 60032 30032 30065 60059 60031 60030) ((*32 *31 *. 

> (NIL) ((*35)) ((*12 >11) 60067) (NIL)) 

LOCAL 

(((*12 *11) 60067) ((*16 *15) 60066) ((*32 *31 *30) 60033 
60065 60059 60031 &0030) ((*18 *19 *20) 60060 60064 60063 
6 60044 60047 &0045 60043 60042) ((*5 *4) GpQ46 60056 G00< 

3 *21 *28 *29) 50020 60026 60025 60049 60041 60021 60050 ( 
15) ((*25 *26 *27) 60019 60053 60036 60054 60036 60037 601 

local 

SMB 

(SMB ASSUMES *7 *6 SAME BODY) 

RESULTS 

(BODY 1. IS *12 *11) 

(BODY 2. IS *16 *15) 

(BODY 3. IS *32 *31 *30) 

(BODY 4. IS *9 *10 *8) 

(BODY 5. IS *18 *19 *20) RESULTS FOR HARD 

(BODY 6. IS *13 *17 *14) 

(BODY 7. IS *5 *4) 

(BODY 8. IS *1 *2 *33) 

(BODY 9. IS *24 *22 *3 *23 *21 (28 *29) 

(BODY io. IS *25 *26 *27) 

(BODY 11, IS *7 *6) 

NIL 
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RESULTS. After having screened out the regions that belong to the 
background, the nuclei are printed as "bodies". 

In this process, the links which may be joining some of the 
nuclei are ignored: RESULTS considers the links of figure 
'FINAL-BRIDGE', for instance, as non-existent. These links 
are the result of imperfections in the heuristics, mistakes in the 
placement of links, and may point out different parsings. An 
improvement to SEE will be to try to "explain" these residual links. 

Summary gEE uaes a var iety of kinds of evidence to link together 
regions of a scene. The links in SEE are supposed to be general 
enough to make SEE an object-analysis system. Each link is a piece 
of evidence that suggests that two or more regions come from the 
same object, and regions that get tied together by enough evidence 
are considered as "nuclei" of possible objects. 

Examples and discussion are In next section. 
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ANALYSIS OF MANY SCENES 


Until we have an adequate analytic theory, the behavior of a 
heuristic program is best understood with examples. There are 
several ways to go about this: 


Simple 


In order to learn what a program does, simple examples 


one illustrating a single feature or group of features, are 
appropiate. 


, each 
very 


1 A shiny impression of a set of routines is obtained by 
presenting 'favorable' cases, designed to enhance the characteristics 
of the program in front of the unsophisticated observer. 

Of course, of all possible inputs, there Is a subset that will 
produce outputs very pleasant in terms of speed, easiness of pro¬ 
gramming, generality, accuracy, or whathever other feature that sys¬ 
tem advertises. This subset tends to get the highlights in the 
descriptions. 


Nfl st v 

' Examples In which the program does particularly poorly are 
useful, if well chosen, to illustrate the weak points and pitfalls 
of the techniques used, the restrictions and constraints in the input, 
etc. They may point out improvements or extensions. 

Silly „ , . , 

Examples having very weak connection with the purpose or 
intention of the routines or algorithms discussed serve no useful 
end, except perhaps to point out that the maker of such examples did 
not understand the issues. For instance, one could take a box full 
of pins, drop them on the table, take their picture and ask SEE to 
work on it. 


A collection of simple, favorable, and nasty examples follows. 
They are not in that order. 


A discussion is found at the end of this section. 
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Stereo Scenes 


Analysis of stereographic pictures will be found in 
the section 'Stereo Perception'. 

Finding the background Examples where the background is not known 
in advance and has to be deduced are given in the section 'Background 
Discrimination by Computer 1 . 
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LIST OF SCENES ANALYZED BY SEE IN THIS SECTION 


PAGE 

Name. Comments. Scene (figure). Computer Results. 


R17 

107 

108 

109 

L3 

110 

111 

112 

R3 

113 

114 

115 

SPREAD 

116 

117 

118 

STACK 

119 

120 

122 

STACK* 

119 

121 

122 

L10 

123 

124 

125 

RIO 

126 

127 

128 

TOWER 

129 

130 

131 

REWOT 

132 

133 

134 

WRIST* 

135 

136 

137 

L2 

138 

141 

142 

R2 

138 

139 

140 

L19 

143 

144 

145 

R19 

146 

147 

148 

CORN 

149 

150 

151 

L9 

152 

153 

154, 155 

R9 

156 

158 

157 

R9T 

156 

159 

160 

TRIAL 

161 

162 

163 

ARCH 

164 

165 

166 

HARD 

167 

168 

169 

L4 

170 

171 

172 

R4 

173 

174 

175 

MONO 

176 

177 

178 

BRIDGE 

179 

180 

181 
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Scene R17 


The three prisms are found. In scenes like this, the 
position of one or two vertices may alter the analysis made by SEE, 
by changing radically the slope-direction of a small segment (such 
as KL and GH, figure ’R17*>, killing several T-Joints and separating 
regions :l-2 from t5-6. 

Small errors in the coordinates of vertices K, L, G, H, and few 
others will drastically change the slope of segments of short length. 
This will transform G and K to be Arrows or Eorks , so that G and K 
will no longer be matching T’s (cf. also 'Conservatism and Tolerance' 
page l73). As a consequence, body :2-l will be disconnected from body 
:5-6. This annoying problem is not difficult to correct, at preproces 
sor level, since there is good information about the slope of the 
(long) line BN : the slope of KL has to agree with the slope of 
BN, giving a good estimate of its true shape. The | SUGGESTION 
rule seems to be that these short segments should be 

"re-oriented" if necessary, to agree with the longer ones, which are 
more reliable. Deeper analysis is found in section *0n Noisy Input’. 

The preprocessor should consider the hypothesis |SUGGESTION*) 
that BKLN are colinear — or SEE should propose it 

for confirmation (see 'Division of Work in Computer Vision', p. )• 

The ia signs In the pr i n touts of some scenes, such as R17 (see 'RESULTS 
FOR R17 1 in page 107), a sign appears as part of the name of every 
region and vertex; that is, «*t3 instead of :3. This will be the case 
in all scenes having names starting with the letter R, differentiating 
the "right regions" from the "left regions". This will become clear 
in the section 'Stereo Perception’, page *33 5 until then, disregard 

the s. 
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R 17 


i 



FIGURE 'El 7' 

The three prisms were correctly found. 
There are several "nasty" coincidences 
in this scene, simulating the data 
that a not-too-satisfactory preprocessor 
will tend to provide. 
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Scene L3 


Without difficulty, two bodies are found. Each region 
contains four strong links relating it with other regions (see 
1 RESULTS FOR L3'). LOCAL is not needed to form nuclei; neither 
SINGLEBODY or SMB. 


Explanation of the printout produced by the program ^ page 11a a 

printout of the results appears. The format is the same for every 
scene. It starts by saying 

SEE 56 ANALYZES 13 

which identifies the name of the program (SEE), Its number (version 

number 58), and the scene to be analyzed (L3). 

EVIDENCE 
LOCALEV1DENCE 
TRIAN6 
GLOBAL 

The different sections of the program print their name, when they 
are entered. 

We then come to a list containing regions (such as : 6 ) and 'gensyms' 
(such as G0009): 

((NIL) ((*6) C0009 60007 60005 G0004) (1*5) 60010 60006 
60007 60004) ((*4) 60010 60009 60006 60005) ((*1) 60015 
60013 60012 60011) ((*2) 60016 60014 60013 60011) 

((*3) 60016 60015 60014 60012) ((*7))) 


This list contains the nuclei and the Hides (strong links); the first 
nucleus that we see is ((* 6 ) 60009 60007 60005 G0004) . wiWitfag' 
that from nucleus (or region) 16 emanate four links, namely 60009, 
G0007 , G0005 and G0004. We can represent this graphically! 



name), then the list of nuclei again, this time shrunk somewhat by 

LOCAL; finally, we see "RESULTS", and then 2 bodies, follo¬ 
wed by NIL, meaning the end of the program. {See page 112). 
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Ficuss *l r 

Two bodies arefoond this scene 
by ourf-fUrograafe. J ... g *-■ 
booths inpbt fete Stitt." 
indicated that region : f latfce 
background. * ’ ® _}... 
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Scene R3 


Two bodies are found in this scene. Vertex F is 
classified as of type 'T', hence only one link there exists between 
:2 and ;4. 

All scenes have regions, vertices and lines (edges) joining 
vertices and separating regions. We generally omit the names of the 
vertices from the drawing (figure 'R3'); we are also omiting the 
coordinate axes. 

Since each region has an inside and an outside, the following 
are invalid or illegal configurations in a scene: 



A line ending nowhere: illegal. 



Out scenes should be such that, 
to disconnect a separate component 
of the graph into two components) 
we have to remove (delete) at least 
two edges. The graph above is 
"illegal" as input to our program, 
since the criterion is not met: 
removing edge E will disconnect 
the graph (cf. page 11 ). 

incidentally, some optical 
illusions are "recognized" or rejec 
ted because they come from illegal 
scenes of the type Just described 
(cf. section 'Optical Illusions'). 

See 'Illegal scenes', page 2.17 , In section. 'On noisy input.' 
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R 3 



A 


FIGURE 'R 3' 


A scene analyzed by the program. 
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Scene SPREAD 


Body :41-42 was found; also :8-18-19. In the first 
case, there was one strong link between :41 and i42, because of the 
heuristic (g) of table 1 GLOBAL EVIDENCE' (page 87), and SINGLEBODY 
completed the object. In the second case, heuristic (g) could not 
be applied, and SMB had to join :19 with :18. 

Bodies :29-30--31-32 and :25-26—27-28 are adequately found. 
Also the badly occluded long body ;10-9-11-12-3 is found. 

Body :21-6-25-20 is found as one body. An older version of 
SEE {Guzman FJCC 68} used to report two: j6—21 and :5-20. The 

change is as follows: one link isbetween :6 and :5 because of 
the matching T's, the other link is a weak one placed because :5 and :20 
form a LEG; a weak link is also placed between :6 and :5. 

:24 gets reported isolated, instead of together with :22-23, 
because no Leg is seen; but see comment (page 30) in section 'Sim¬ 
plified View of Scene Analysis'. 

SEE tries to find a "minimal" answer; minimal in the sense 
that it will try to explain the scene with the minimum possible num¬ 
ber of bodies (cf. section 'The Concept of a Body 1 ). That is the 
reason which Joined :41 and :42 in one body, instead of two, which 
is another possible correct answer. That is also true of : 19-18-8, 
interpreted as one parallelepiped with a vertical face (:19) and an 
horizontal face (:18-8). 

The background of SPREAD is also computed (see page 226 of section 
'Background Discrimination by Computer'). 
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SPREAD 



Bodies :10-9-11-12-3 and :6-21-5-20 are properly found. Also is 
correctly identified the body :19-18-8, which is a parallelepiped 
with a vertical face (:19) and an horizontal face (:8-18). 
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Scenes STACK and STACK* In both cases all the bodies were accurately 
Identified by our program, which is written in LISP. In both cases 
the body t4-15-l6 is fo^nd. 

These scenes show that In many instances one could drastically 
alter the position of a vertex, without modifying the output of SEE 
(compare figure 'STACK' with 'STACK*'). 

Other examples would show that the vertices of type 'L* can be 
arbitrarily displaced, so long as their type remains 'L' and other 
vertices do not change type, without detrimental effect. This dis¬ 
placement may possibly affect some heuristics that use concepts of 
parallelism or colinearity, but not the rules that use the shape or 
type of a vertex (cf. table 'VERTICES', page 69) for placing and 
inhibiting links. Read 'Misplaced vertices' in page 2 H , in section 
'On noisy input.' 
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STACK 

r 



FIGURE 'STACK' 

Every body is correctly identified. Compare with scene STACK*. 
This pair of drawings illustrate the fact that it is often 
ppssible to disturb the coordinates (the position) of a vertex, 
without introducing errors in the recognition. 
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Scene L10 


The concave object '11-15-14-7-6 presents no 


problem, since there are plenty of visible vertices 
(figure 'L10'), and SEE makes good use of them. 

SINGLEBODY is necessary to Join regions sl3 and 

: 2 . 

The bodies of a scene do not need to be 
prismatic in shape, nor convex. Their vertices could 
have errors in their two-dimensional position. Table 

'ASSUMPTIONS' (page 255) specifies the suppositions that 
our program obeys. 
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FIGURE 'L 1 O' 


Singlebody had to join :2 with :13. 

All four bodies were happily identified. 
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Scene RIO 


Pour bodies are found by our program in RIO. 

The scene is a good example of a ’•noisy" scene, in which edges that 
should be straight look crooked. This is because the coordinates 
of each vertex are "inqsrecise"; the vertices have some error in 
their coordinates. Other scenes also show this tendency; they 
accurately represent the data analysed by SEE (the scenes in their 
final form were drawn by program, then inked manually), and should 
not be considered as "sloppy drawing jobs". 

SEE has several ways to cope with these imperfections: 

(1) tolerant definitions of parallelism and colinearity. 

(2) insensitivity of heuristics to displacements of the vertex. 

Por instance, vertex V will inhibit the link that Z proposes, 
either when V is of type 'Arrow' or when it is of type 'T' 

(but not when 'Pork'): 

(3) Large variations in the coordinates of a vertex are possible 
before that vertex changes type. Vertex of type 'T' are an 
exception, changing into a Pork or an Arrow by a small displa¬ 
cement . 

Nevertheless, it is possible to "straighten" these vertices, 
by following the suggestion in the comments to scene R17. 

The section 'On Noisy Input' deals with these matters. 
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R 10 



FIGURE ’RIO’ 

The scene contains "noisy” vertices; hence, some 
edges look bent. SEE has resources to aope with these 
probleas. 

Figures LIO and RIO fora a stereo pair. In figure 
'110 - RIO* in page ZiT , informstioit from both scenes 
is combined to find the position of these object* in 
three-dimensional space. See section ’StSreo Reception'. 
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Ccene TOWER ^here is no need to make use of LOCAL or SINGLEBODY 
in this scene, since there are plenty of global (strong) links 
among the different regions. :18-22 and :17-23 get links thanks 
to the heuristic that analyzes vertex of type "X". 

There are several "false" vertices, formed by coindicences of 
edges and "genuine" vertices: the vertex common to :9, 11, 12 and 13; 
the one common to :2, 4, 5, 6. They do not cause problem, because 

(1) in the case of the vertex common to :9, 11, 12 and 13, it is of 
type "MULTI', and no link is laid. 

(2) In the case of the vertex shared by regions :2, 4, 5, and 6, 

it is an "X" that will establish one link between :4 and :5 (which 
is correct), and another between :2 and :6 (which will do no 
harm, since we need two "wrong" or misplaced links to cause a 
recognition mistake). 

Compare with scene 'BKWOT'. 
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TOWED 



FIGURE 'TOWER' 

A "wrong" link is placed between :2 and :6, 
without serious consequences. Results for 
this scene are in "RESULTS FOR TOWER'. 


130 















CM — # 

ON - N 

onw —» 

3 O TO — 

a •• h - 

r> — o 

cm ~ o — 

O CM L 5 4 > 

O * •— W 
(3 OK t a 
Of) **o 
-(iOOU 

to o o 

— n f) 

wrttilOO 

n *t o 

■'0 0(3 

« ■>» o o 
K oa N 

w -» # 

OKO-O 
ON^ra 
uoo«u 
00- 

• 13 U »» 

w # 

000 — 0 

ONttO 

13 oonu 
000 
0 ( 3 ( 10 * 

cm 3 ro 

0«4K - 
OCCAv 

(loon - 
000 
N (1 J O » 

cm 3 ro 

o * - *-4 

O -4 CM # O 
(S — - -4 O 
- — o CS 

■ j m 

CM — w 

- — ro — o 
-«r)«o 
-NON (3 

00 — 
- 0(3 ■»« 
O J M 

— Ifl - o 

o«nno 

O CM O CM 3 


-UNO* 
o # 

— moo 
—• CM 3 O 
CM o 9 
O o M 

O 9 CM « 
cd or> 

— 00 

n (M 3 o 
CM - (3 

o — 

O h K K 

- - O 

— o 
n or) j 
O *-• CM 

O tt H N 

9 — - n 
« - o 

— o 
cm — - a 

« J - 
N - O O 
2 

O w W o 
•x • O 
— 5 S 

— o 

n ~ — 

O J *4 

— o ►-* — 

JJ 2 

— - r» 

z ^ •• 

W« - 

O M CM 

— o ro m 

Ml O - 
W o •— 

ono 

O'# -» 

UOlfl* 
o r> m 
e 3 ON 
•4 O •» 
O CM 9 

o n — 

9 0 # 

O M — 
0 3 3 J 
CM o — 
O CM CS Z 
o # — 


— o 
— CS 

— #> 


Z N o 

- M O 

9 

— o 

-J • 

- - M 

z ~ o 

— © 
— CS 

— m 

*4 N M 
NO H 

000 
030 
CS CS 

►v 

ro cm m 

NO** 

000 
O IS o 
CS CS 
9 
w CM 
N O M 
O O — 
O CS 
(S 0 

o « 

— CM 

cm o m 

NO** 

- 3 - 

« **• — 


**•(3 NK 

o r> 

— ID o O 
KNC 30 

wo cs 

oo« 

O CS CM 9 

cs o ro 
— 00 

«»l|0 
W m CS 

O M 
O W CM O 

cs 


«— K. O 
"NO 

o cs 
ro o 
CM 9 O 
•• CM 

— 90 
-NO 

o cs 

— o 

r> 9 k* 

o to o 

O CM O 

CS O cs 

o 

m cs ~ 

O — CM 
O *f — 
CS o 

O 9 
10(3 *4 
M •• 

o in 

ONO 
(SON 
O m 
« 9 
■# — 

o — 
o 9 — 
CS M «* 
CM 

# w O 

** **o 

0—9 

o 

u or) 
W CM 
0-0 
-» — O 

o — cs 

o 

3 — — 
M CM 

— r> o 

#00 

— O CS 

cs 

0 — 

— m cm 
rs CM 

00 — 

W 9 
M (l« 


0 J o« 

0 r*> — 

— 0 0 

- - 

9 CS O # 

CM cs a ro 

O 3 0 

— # - 

- « a 

CM 0 w 

CM O CS 

# — 

Q 3 IDNO 

0 0 3 0 

O - - 

— 0. — 

0 CM O CM CS 

0 ro 0 

O •• K 9 

K. O 

3 0 0 0 

CS 0 <0 3 

(S'- — # 

PS 9 ~ 

OJ 3 * 

0 n 

~ - O 

Q 2 

0 3 O'# 

ouoin 

CM O 

0 ro p> 

CM#- 

— 0 w 

CM - ro CS 

3 ro a 

0 - 4 ®'- 

0—90 

0 0 CM 

0 0 

040 N- 

O CM O 

0(040 

9 0 3 

3 0 0 0 

3-103 

3 0 — r» 

PS 3 

0 3 0 - 

- O 

0 — 0 

O -4 

<M CS CS # 

9 0 tn 

0 3 0 

0 0 ro 

CM 4 > -• 

# 0 # 

w — 3 

9 ro 0 

O — # — O 

0 w 9 0 

0 - - 

0 0 


oc 

Ml 

K 

o 


u > 

« Z Ul (3 

mu jz 
a < 4 
u** 0 « 
iiisoir 
tf)U JH- 


0 0 0 9 0 O " O 

3*40*3 CS O CS 

- 9 — ro # 

— — — in owa* 

9 — — -* N •» OM 

— 0—3 0-30 

4 - "(DO O - O 

— # — ro cs cs —cs 

— fM — O — 4» 

O 0*0 N J" < 

— O — 3 # w -• -« 

0(3 *4 O OZKO 

w, ro # o o - o 

O# 00(3 CS CS 

ON 00 — — O 

uojo-*-j**m 
o cs r, k -* - - *< 
rv cs CM W ro CM z — O 

h rj o 4 e « ** o 

o — o ro — o — d 
o ro o o — cs 9 — m 
cs cm cs a *4 jw - 
— J — 9 - — O # 

• — cm # ro z o - 

'10 0- 3 

1 o cm m 

I (3 I* -K. •• 

— -J CM 

o — — o in — 
1 •# zo*- 
- CS •• N. 
10 j — n 

► CS — — 9 — O 


O OK O 

o — o •• o 

(3 *4 13 - (3 
CM — 

9 on _ 

— O # — CM 
o cs o o o 
o ono 

(3D(J 0(3 
CM O 
— o — cs — 
a o # K 
CM CS -4 -4 

<• - ro — 

-■*-o» 

~ CM — O ~ 
CM (S 

— 

4 _l -OfD - 

a** **n« # 

O Z O O W 

J--004 

cs — w IS CS — 


9 — — o — CS 
# Z O _J 

O A >*3 M o 

o j z ro 
13 -. - o - o 
Z -3 CM O 
— . ~ M o — 9 
W z o # 

--3 NO 

— _» o # 
•*•" - ^00 

Z JMOO 

— «-* M- O CS 

— ~ z o « 


or) # K 

(30*44 

o •• o 

9 9—0 

— — CS 
o r» 

O 4 - K 

cs o # ro 
a »o o 

0(3 o o 
CM 0 CS 
© CM CS 

on 40 

30IM 
or) o 
NJOO 
W O CS 
o CM CS 
O’# — 

304*4 

o # • 

— CS o 
w o ro 

N 0(3 4 

4 ro 
a ro cm 
9 o ro 4 
*430- 

M O — 
— 3 

o CM « 
CM -4 4> — 
" * r) 4 

— O CM 

— # O •• 
w 9 — 

— — 

-I lf> 

-nn * 
z - on 

— 4 0 — 
— CS o 

— — o 

->-» 4) (S 

4 - - 4 

u z j o m 
o — — o *-* 

J - Z 3 O 




o 

E-< 

Pi 

o 

Pn 


C /5 

u 

C/5 


003 

# CJ 

o n 
am 4 

CS 10 o 

o o 

9 0 3 


CS # O 
o o 
«0 O CS 
ro ts 

O CM — 

o — # — 

3 C ON 

M O W 

K. 30 

4 K O 

o •• o cs 

a ro 

3 0 0 0 
4 0- 

N*30 Z Z 

ro — o 2 w l! 

0—0 — 3 — N 

O — CM - # 2 " " " 

( 14—0 — " « • — ^ 

CM - CM - N 4 - 4 NO 

00 0 m — — —n — 

#040 P> 44 K 4444 

03-3 — • 

q m m w o n • o 

34> CM CM -NO— —— N 

CM D CM |»« 44 » 44 I» 

—> o — o 

**o « o mm###### 

w CS — CS 

ro 4 Ml «#»»•••• 

— CM — #-Nn 4 «-OK« 

o m o h 

J CM O 'CM O Jk 

< — j 03 < saaaaaQaa 

0 - o o®<nooooooo o. 

o — — (soojcuirnmnjaiciicn®®* 

-j — k - j «) a- wwww; 


131 








Scene REWOT 


This scene (see figure 'REWOT') is the same as the 
scene TOWER (see figure 'TOWER 1 ), but upside down. The program 
obtains identical results for both scenes (see 'Results for Tower' 
and 'Results for Rewot'), because SEE does not use information about 
a body supporting or leaning on another body. For instance, it 
was not assumed that body jl-2-3 is partially supporting (in figure 
'TOWER') body s4-5-15; clearly this assumption falls in case of 
figure 'REWOT'. But since the assumption is not followed, the pro¬ 
gram succeeds in both cases (gives same results). 

See table 'ASSUMPTIONS' (page 255) for suppositions that the 
program makes or presumptions that it does not need. 

The regions :16 and :24 had to be marked as part of the 
background, following standard practice (cf. 'input Format'). 
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FIGURE 'R E W 0 T' 

This scene Is the same as the scene TOWER, 
but with Y replaced by 100. - Y, and 
X replaced by 100. - X I It is upside 
down. SEE still finds eight bodies. 
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62 60161 60155 80153 60152, (012, 60158 60146 60145, ((*7, 60151 

i“ U4 60142 60140, ((.8) 60159 60150 60149, ( 
60128, <(<16, > ((*17, 60141 60139, ((*15, 60163 
£2* 291 * (,4) 60163 60160 G0130 60128, <(.3, 60164 60162 601 
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<NIL> <NIU <( * 13 * 14 *'*2, 60145 60157 60147 6 
b0i * 5) <NlL, ((*10 *11 .9, 60140 60156 G0143 60144 G01 
42 60140, ((*6 .7 *8) 60161 60150 60151 60146 60159 60150 60149, 
in L <L!U 1 f n i ( ! 23 U7 ' 60139 60141 60139, (NIL, ((*15 *5 *4, 60 
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1 60152 60162 60153 60164 60155 60154 60152,, 

LOCAL 
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6014. 00142 001401 <. •• ./.;"«»•» S.wJ 001S1 60 
* U “! 16 ‘> M**3 *17, 6Q139 60141 60139, ((•! 
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LOCAL 
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0141 60139, ((*6 *7 *6, 60161 G0150 60151 6 q 148 60159 60150 60149 
> ((*10 *11 *9, 60140 60156 60143 60144 60142 60140, ((*13 *14 *1 
2, 60145 60157 60147 60158 60146 60145, (<»i! *22, 60136 60138 !o 

136 <(*20 *19 *21, 60132 60135 60134 60131 60137 60135 60133 601 

321 I 


LOCAL 

SMB 


RESULTS 
(BODY 1. 

16 

*2 *3 *1, 


(BODY 2. 

18 

*15 *5 *4, 


(BODY 3. 

18 

*23 *17) 

*6 *7 *8, 


(BODY 4. 

IS 

RESULTS FOR REWOT 

(BODY 5. 

18 

*10 *11 *9, 

(BODY 6, 

IS 

*13 *14 *12, 


(BODY 7. 

IS 

*18 *22, 


(BODY 8, 

IS 

*20 *19 *21, 
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Scene WHIST* 


The concave objects are properly Identified. W places 
a link between >23 and :4, and another between >30 and :4. CC does 


not inhibit the link between tl7 and |19 ordered by the Arrow NA, 
because NOSABO was never called, since the first rule of 'ARROW' 

(page ) was applied. 

The only mistake was that objects :9-7-6 and tlO-5 should be 
fused and reported as only one. There is a link between :9 and >10 
put by heuristic (g) of table 'GLOBAL EVIDENCE'. It is not enough. 
There is also a weak link between 'Triangles' :5 and :6. OB is not 
a 'Leg', so there is no weak link between ilO and t5. The situation 
is as follows (see chains of links in 'RESULTS EOR WRIST*; how to 
read these chains is explained in page t|° , 'Explanation of the print- 



Almost the same thing occurs with :l-2-22-21, but in this case 
vertex A produces one strong link between 22 and 21, and vertex R, by 
heuristic (g) of table 'Global Evidence', also links 22 with 21. This 
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F 


WRI8T * 



FIGURE 'WRIST*' 

Instead of one, two bodies were found in :9-7-6 and :10-5 
Insufficiency of links was the offending reason. All other 
objects were correctly found. 
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Scenes L2 and R2 


Two objects are found, as expected. 

These scenes form a stereographic pair: two pictures taken from 
the same scene from slightly different locations, mantaining parallel 
the optical axes of the cameras, and the same magnification. A pro¬ 
gram, not yet completed, is designed with the following ideas: 

Left and right pictures are independently processed by SEE; L2 and 
R2 in this example. The answers are 

ANALYSIS OF L2 ANALYSIS OF R2 

(BODY 1. IS :2 :4) (BODY 1. IS %:1 X:2 %:4) 

(BODY 2. IS :1 :5 :3) (BODY 2. IS Z:3 %:6 Z:5) 

The question is now: Is body :2-:4 the same body as Z:l-X:2-%:4, 
or is it %:3-%:6-Z:5 ? It is required, after decomposition of the 
scene into bodies, to match the left bodies with the right bodies. 

If this is accomplished, one could then locate the figure in three 
dimensional space, from the two-dimensional coordinates of the figure 
in the left and right scenes. 

In this way it will be known where these objects are located in 
the "real world". 

This "matching" mentioned above is complicated as follows: 

“ It is possible that the number of objects observed in one view 
is different from the number in the other. 

-- On a given object, it is possible that SEE will make a mistake 
in the left view, but not in the right view; as a consequence, 
two bodies on the left have to be matched with one on the right. 

If the two axes of the camera are on an horizontal plane, a vertex 
in the left scene and its corresponding vertex in the right scene 
(if visible) will have the same y-coordinate, such as H in L2 and 
%I in R2. Other known relations exist, derived from the relative 
position of the axes of the camera, magnification, etc. See section 
'Stereo Perception'. 
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L 2 



FIGURE 'L 2' 

Even if (possibly) a face of object :4-2 is missing, 
in this case SEE makes the correct identification. 
Section 'On Noisy Input' deals with imperfect 
information. 
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Scene L19 


The small triangle :15 just could not get joined with 
the remainder of the body :16-20-19, and two objects were found. 
There is a weak link between :15 and :19, but it did not help since 
there is no link between :15 and :16. What happens is that regions 
:1, :15, :13 and :22 all meet forming a vertex of type MULTI; this 
vertex should (in some future version of SEE) be split into two, sin 
ce both :1 and :37 are the background- The rule for this splitting 
seems to be •• 



tU was J oil *ed with :4, but isolated from s!2-27-5. There are 
no T-joints between these two nuclei that could give 'hints' (i. e., 
links) for their unification. 

The two large concave objects were properly isolated. 

Compare with R19 and WRIST*. 

See 'Merged vertices', page 221 in section 'On noisy input.' 
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L 19 



FIGURE 'Ll 9' 

.. was eas y to find :6-7-8-9, the hexagonal prism. 
:15 was reported as a jingle object: a mistake. The two big 
concave objects were appropiately identified. 
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Scene R19 


As in L19, here the triangle :27 is detached from 
:5-32-33, two bodies being reported. There is no strong link between 
:27 and :33. There is a weak link between :27 and :5, because both 
are 'triangles' facing each other, but that is not enough. A weak 
link is never enough. 

All other bodies are properly found, including :10-16-2-3. 

Vertex RA, of course, contributes with no links. The situation 

could change if we discover that RA is a false vertex I . . 

, vertex, | SUGCESTI0N | 

that is, one composed by the merge of two genuine ones. 

There is enough enformation, I think* ®*nce i34 and j37 are backgound, 

and this will suggest a way to "divide" vertex RA into two simpler 

ones. This idea of dividing vertices of type MULTI into simpler 

ones should be applied with caution, since there will be genuine 

vertex of type MULTI (which should not be split). The main use of 

this technique will be for helping single regions to join some other 

body, a task performed now, not too satisfactorially, by SMB. 

Compare with L19 and WRIST*. 

See *merged vertices', page221 . 
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fXflDtt' •*. 1 9‘ 

127 was separated ftt*|»33-32“5 All othrr 
objects were correctly found. ‘ 
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Scene CORN ^ pyr ami d : 8-9-10 was easily identified because a vertex 
of type PEAK produces many links. In the bottom, bodies :l-2-3-4 and 
:12-13-11 were separated, because the fork between :4 and :12 has the 
background as a region, and did not contribute with any links. Cer¬ 
tainly, this is a pssible interpretation. Another interpretation is 
to regard the object :1-2-3-4-11-12-13 as a prism with the shape 
of a "C". 

SINGLEBODY was needed to join :4 with :2-3-l, the only link 
being placed by heuristic (g) of table 'GLOBAL EVIDENCE. 

The program knows that :22 is the background. 

If we could see the hidden vertex KK (if it indeed exists), 
two links would be put and we will have had one body: 



149 
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22 



FIGURE 

The pyramid at the top was identified 
properly. Two bodies were found at 
the bottom, which is a plausible 
interpretation: :1-2-3-4 and :11-12-13. 
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Scene L9 


Here the tolerances SINTO and COLTO that allow for 


"sloppy parallelism" have made T's out of NA and FA. Therefore, 
these vertices do not contribute any links for :1. Moreover, the 
"T" PA inhibits the link suggested by QA between :1 and 18. 

That being all, tl gets reported as a single body (see next page). 

By decreasing the tolerances, correct identification is possible 

(see the correct identification in page 155 )• 

See 'Tolerances in collinearity and parallelism', pagellS . 
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Four bodies are identified. Body :l-8-9-7-5-6' gives some problems. 




SEE 56 ANALYZES L9 
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Smaller values for SHfTO and 
COLTOi the parameters for 
parallelism and collnesrlty, 
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(BODY 9. IS *l« *15 *2 *20 *18 Ml 



Scenes R9 and R9T Four bodies are found inR9, five in R9T. The 
difference is that Y and JA (see figure at bottom of this page) are not 
"matching T's"in R9T. The strong links among :12, :3, :10, and :16 are 



LINKS FOR R 9 


LINKS FOR R 9 T 


In R9, the two strong links (G0030 and G0021) between :12 and :10 
were put by the matching T's Z-EA and Y-JA; of the two strong links 
between :10 and :16, one was because DA is an arrow; the other, 
because EA is a "T" for which heuristic (g) of table 'GLOBAL EVIDENCE' 
applies. 

But in scene R9T, not having Y and JA as matching T's, a link 
between :10 and :12 disappears; and also nuclei :16 and :10 can 
not be linked by heuristic (g) of table 'GLOBAL EVIDENCE'. SEE deci¬ 
des to report two bodies there: :3-12 and :16-10 instead of one 
as in scene R9. 



Are Y and JA matching 
T's or not? Different 
answers produce different 
analyses of the scene. 

These scenes show that the analyses can be quite sensitive to 
the right" definition of parallelism and colinearity. 
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R 9 



FIGURE 'R 9' 

The four bodies were found. 
SINGLEBODIES was needed to join :18 
with :6-11-1-4-2. 
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FIGURE 'R 9 T 
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Scene TRIAL 


This scene has been analyzed in great detail in the 


section that describes the program SEE. Its links are found in 
graphic form in figure 'TRIAL - LINKS', or in written form (lists) 
in "RESULTS FOR TRIAL". 

LOCAL had to join :13 with the remainder of that body. 
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FIGURE 'TRIAL* 
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Scene ARCH 


SEE analyzes scene ARCH (see figure 'ARCH') with results 
displayed in 'RESULTS FOR ARCH'. This is an scene composed of many 
degenerate views of objects. It is an ambiguous scene (see section 
on Optical Illusions), in that several good interpretations are po¬ 
ssible. 

The program reports :7 and :17 as one body, which could be plau 
sible. :16, :9 and :10 get reported as independent objects. In 
the scene from where this picture or line drawing was taken, :7, :17 
and :16 were the vertical face of an object. :10 was the vertical 
face of another, :9 being its horizontal (top) face. In cases like 
this, in order to choose the "right" one of several possible inter¬ 
pretations, more information has to be supplied to the program, such 
as lighting, textures, color, etc. 

No link was put by A between :3 and :29, or by UB between :5 and 
:19, because D and W are GOODTs. In one case, G provides with more 
links and causes :3-8-29-31 to be reported as one body, which is 
correct; in the other case, Q can not supply any links, and that 
body is split in two: :5-4 and :19-18. This is a mistake of GOODT, 
who accepts W as a genuine T. If this were not the case, the arrow UB 
would establish- a link between :5 and :19, avoiding the mistake. GOODT 
could stand some improvement. 

The body :22-23 was identified correctly. 
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ARCH 



FIGURE "A R C H" 


Ambiguous scene that could be correctly interpreted in 
several different manners. :7-17 was reported as a single 
body (see table 'RESULTS FOR ARCH’), and also :9. 

The body :5-4-19-18 was split in two: :5-4 and :19-18, 
but not :3-8-29-31, which was counted as one body. 
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Scene HARD 


This scene consists of objects of the same shape, namaly 
triangular prisms. All are correctly identified, including the long 
and twice occluded J3-21-22-23-24-28-29. : 1-2-33 was also found. 

LOCAL had to be used to Join »15 with :16, and also jll with jl2. 

In an older version of the program, x7 was identified as a sin¬ 
gle body, and :6 as another, because they have no visible '•useful" 
vertices to place links {Gusman PISA 68 }. Now SEE joins 16 and j7, 
because both are "GOODPALs". See "Operation of the Program; SMB”(page 

99 ). 

These scenes are sometimes obtained from a picture, so that 
they are the result of a perspective transformation. Some other 
scenes are drawn more or less in an orthogonal or isometric projection. 
SEE does not depend heavily in the type of projection; there are only 
a few heuristics that use notions of parallelism. 
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HARD 



FIGURE 'HARD' 

All the bodies were correctly found. 
The most difficult was :6-7, since SMB 
had to join both regions, which do 
not have "useful" visible vertices. 
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Scene L4 


The body :10-9 was reported isolated from :13-2-3, 
due to insufficiency of links. See comments to figure R17, also. 
The algorithm that localizes matching T's could stand improvement. 

It sometimes produces "bad links" such as between :4 and :13, and 
between :6 and :3, because it found two T's that looked like they 
were matching (this mistake diu not happen, actually, because vertex 
R is not a T, but a fork'.), EA and R in this case. The suggestion 
in page I'll will lessen, but not suppress, these "mistakes". 
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FIGURE '4 4' ^ * - _ [ V ~ 

Body : 2-3-1J vis tdpottad- ««p«KA£«d 
fra* body :U*-f. too fow T.jbtoU 
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Scene R4 


The table 'RESULTS FOR R4' shows what happens when the 
tolerances are too large. Five bodies are found. Vertex B Is 
considered to be a "T", and inhibits the links suggested by the Arrows 
R and A. As a result, :1 gets cut off :7-9-5-10. 

The way :2 gets Isolated Is as follows: T and AA claim to be 
matching T's, the link suggested by U is inhibited by Z (a Comer), 
and :2 gets disconnected from :3-4. 

The correct solution is obtained after reducing the values of 
COLTO and SINTO to 0.05 and 0.005 (see listings; C0LT0 decides if two 
lines are colinear, SINTO if they are parallel), respectively. The 
results appear also in 'RESULTS FOR R4', and we can see now that only 
three bodies (the correct ones) are identified. 

Suggestion Lines like the one below should be JllUG&ESTl^^j 

"straightened" either by SEE or (better) by the preprocessor; for 
example, B K L N and D G H 0 in figure R17. See section 'On Noisy 
Input'. 



Conservatism and Tolerance 


More strict tolerances do not make the 


program more conservative in all cases: the link In (a) fails to be 
placed if the program has too loose (large) tolerances, because A 
will be transformed into a "T" (it will be considered to be a "T"), 
losing the link; the link in (b) falls to be laid if the tolerances 
are too strict, because the T-joints will not be colinear. 



In (a), links disappear if tolerances are 
too big; in (b), If they are too small. 
In both cases, conservative behavior (cf. 
page Z\%) appears. 
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FIGURE ’* 4* •- f ■ 

Either three or five bodies atefo«ft&Y fcceo*dULage*S> the, values of 
certain parameters. These scenes are "noisy" la the sense that 
the coordinates of the vertices depart from their "Ideal" position 
by as ■ach-'ajT one ittlimlrr. or about 1 X of the total else of 
the image, yhich is abourtf one decimeter. This erfbak-ts not large 
enough to affect long lines, bw*-it may lobstaMiaUftS^* the 
direction of short<segments. 
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Scene HOMO 


The long body :29-30-34-20-19 gets Identified as follows: 
:29 and :30 get two links, and :30 with :19 also, so we have the 
nucleus :29-30-19. Two links (because of matching T's) join :34 with 
:20, to form nucleus :34-20. Regions :30 and :34 receive a strong 
link, by heuristic (g) of table 'GLOBAL EVIDENCE', and :19 with :20 
by the same reason. That completes the body. 

The fork that is common to :12, 13 and 14 puts a link between 
:12 and :13, but it is not enough to cause mis-recognition. A link 
is put by that same Fork between :13 and :14, as it should be, but 
the link between :12 and :14 is inhibited by N0SAB0. 

There is a program that finds regions of a scene belonging to 
the background, when not Indicated as such in the input. For MOMO, 
the results of this program appear in page 131 . 
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MO MO 



FIGURE 'M O M O' 

All bodies are correctly identified. 
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SEC 56 ANALYZES MqMq 
EVIDENCE 
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Region :10 gets a strong and a weak link with *4, and that 


is enough to Join them. The same is true for :7. 

The links of scene BRIDGE (see 'RESULTS FOR BRIDGE') are discussed 
and displayed in pages 9*~98 , figures 'LINKS-BRIDGE' (page 95 ), 

'NUCLEI-BRIDGE' (page 96 ), 'NEW-NUCLEI-BRIDGE' (page 97 ) , and 'FINAL- 

BRIDGE' (page98). 

Because RA and SA are matching T's, two wrong links are placedl 
one between :22 and :28, and the other between i21 and :29. This is 
not enough to cause an error, because we need two mistakes (two rein¬ 
forcing each other), two wrong strong links, to fool the program. But 
that could happen. 

It is interesting to note the way in which the long "horizontal 
table" 125-24-21-27-9-12 was put together. To this effect, see figures 
'LINKS-BRIDGE' and 'NUCLEI-BRIDGE'. 

Vertex JB produces only one link between :5 and t 8 . Vertex KB ln-r 
hibits the link (through NOSABO) between 18 and :9, and the link between 
:5 and :9 gets inhibited by S, because it is a T (cf. NOSABO, page 82). 

The concave object *7-6-5-4-8-10-11 gets properly identified. 

We may say that, in general, the more "crooked" or complicated an object 
is, the easier will be for SEE to isolate it, because there will be 
many vertices contributing with valuable links. 

No mistake was made by SEE on BRIDGE; its eight bodies were co¬ 
rrectly identified (see 'RESULTS FOR BRIDGE', page ISO- 

The background of 'BRIDGE' was also correctly isolated; see that 
in pageZBO, section 'On background discrimination by computer'. 
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BRIDGE 



FIGURE 'BRIDGE' 
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DISCUSSION 


We have described a program that analyses a three-di¬ 
mensional scene (presented in the form of a line draw¬ 
ing) and splits it into “objects” on the basis of pure 
form. If we consider a scene as a set of regions (sur¬ 
faces), then SEE partitions the set into appropriate sub¬ 
sets, each subset forming a three-dimensional body or 
object. 

The performance of SEE shows to us that it ie possible 
to separate a scene into the objects forming it, without need¬ 
ing to know in detail these objects; SEE does not need 
to know the ‘definitions’ or descriptions of a pyramid, or 
a pentagonal prism, in order to isolate these objects in a 
scene containing them, even in the case where they are 
partially occluded. 

The basic idea behind SEE is to make global use of in¬ 
formation collected locally at each vertex: this informa¬ 
tion is noisy and SEE has ways to combine many dif¬ 
ferent kinds of unreliable evidence to make fairly re¬ 
liable global judgments. 

The essentials are: 

(1) Representation as vertices (with coordinates), 
lines and regions 

(2) Types of vertices. 

(3) Concepts of links (strong and weak), nuclei and 
rules for forming them. 

The current version of SEE is restricted to scenes pre¬ 
sented in symbolic form. 

Since SEE requires two strong evidences to join two 
nuclei, it appears that its judgments will lie in the 
‘safe’ side, that is, SEE will almost never join two re¬ 
gions that belong to different bodies. Fran the analysis 
of scenes shown above, its errors are almost always of 
the same type: regions that should be joined are left 
separated. We could say that SEE behaves “conserv¬ 
atively,” especially in the presence of ambiguities. 

Divisions of the evidence into two types, strong and 
weak, results in a good compromise. The weak evidence 
is considered to favor linking the regions, but this evi¬ 
dence is used only to reinforee evidence Iran more re¬ 
liable dues. Indeed, the weak links that give extra 
weight to nearly parallel lines are a concession to ob¬ 
ject-recognition, in the sense of letting the analysis sys¬ 
tem exploit the fact that rectangular objects are oom- 
mon enough in the real world to warrant special atten¬ 
tion. 

Most of the ideas in SEE will work on curves too. 
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CURVED 


OBJECTS 


How to extend SEE to work with objects possessing curved surfaces. 


Most of the heuristics that establish links 
at each vertex are unconcerned if the edges are curved or straight; a 
few heuristics get affected: those that use the concepts of collinea- 
rity and parallelism. 

Thus, it is necessary to redefine and broaden these concepts. 

1. A slight generalisation is obtained if each segment is represented 

as having two slopes (initial and final). The functions PARALLEL and 

CGLIHEAR of SEE are already modified for this (cf. listings). 

0 SEE does not care if the line joining two vertices 

\ is a straight or curved line. The Information 

\ about the segment A-B that la relevant to SEE is: 

\ : 2 (a) There is a line between vertex A and vertex B. 

M (b) The coordinates of A and B. 

\ (c) The segment A-B separates region :1 from :2. 

NB 

2. Attests to take limited account of the shape of the segment carry 


us to 


(a) gently bent segment* (definition) are those with bounded slope 
[Bounded curvature will lead to another definition]. 

A auasi-rectillnear object has faces, vertices and gently 
bent edges or segments; it is expected that SEE will^work^ 
well for them. We should try some scenes. 




a, b: gently bent segments, c: non-gently bent 
segment. A gently bent segment has a slope that 
at any point of the segment does not differ more 
than epsilon from the mean slope of the segment. 

All slopes fall in an interval around the mean 
slope. Gently bent segments form quasi-rectilinear 
objects. 
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Quasi-rectilinear objects. It is expected 
that SEE will work well for them. 


( b ) partition of a non-gently bent segment into several g ently 

Man y of the bodies have vertices and curved edges, 
but the bodies are not quasi-rectilinear (a piece of chewed 
gum, leaves of a tree). By breaking the edges into gently 
bent sub-segments, they become quasi-rectilinear bodies. 

The breaks will occur in points where the curvature is large. 
There has to be devised away to break a segment in a unique 
manner. To avoid breaking a body into two by the introduc¬ 
tion of these artificial vertices, we propose to introduce 
also artificial links between regions, to account for the 
artificial vertex. 

The non-gently bent segment ab 
gets broken into gently bent seg¬ 
ments ak, kl, lm, mb, by the 
artificial introduction of "new" 
vertices k, 1, m. 



Here, the introduction of 
additional vertices has to 
be accompanied by 'artifi¬ 
cial' or reinforcing links, 
to preserve the individua¬ 
lity of the body (of the 
owner of such vertices). 



3. More complete consideration of the shape of the segments is obtai¬ 
ned as follows: 

(a) For parallelism, by requiring that two segments be parallel 
only if one is a translation of the other. Generally, this 
is a comparison that takes a time proportional to the length 
of the segment. Chain encoding {Freeman} {Conrad} is suggested. 




(b) For colinearity, by discovering properties or features that 
"carry through" or are common. Among these are: 

1. Mathematical "regularity" of the segments. Both segments 
are described by the same or similar polynomials, etc. 

2. Heuristic properties: there must exist properties which 
will select with high probability the "right" continua¬ 
tion. 


3. Outside of the set of geometric properties, we have 
color, texture, etc. 



tour". 

Alternatively, we may forget these properties here and include 
them into models of our curved objects, but then we are for¬ 
ced to make searchs in our scene like those made by DT or TD 


{my M.S. Thesis}. 



Fig. ’SUITCASE S' 

Heuristic properties of segments (yet to be 
determined) could select a "correct" match 
for endings a, b, ..., k,l. 
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4. Bodies with no edges and vertices are In principle easily identi- 



The bodies have no curved edges, and no vertices. The entire 
surface is smooth; no sharp edges or pointy corners. Examples: 
an inflated balloon, a frankfurt, a face, a cloud. ^ 

It is doubtful that we could do something here with SEE. We 
could try to postulate ,, artificial"vertices, using stereo perhaps, 
at the points where the 3-dim curvature is large, and then postu¬ 
late lines between such vertices. This looks bad. 

Or we could reason as follows: since these objects do not 
have vertices or edges, then the only vertices appealing is the 
scene must sep arate two bodies . They will be mainly T-joints, 
(cf also page 46) 

In principle, separation into bodies looks promising, but 
recognition (the answer to “what is the name of this object?") 
seems difficult. Nevertheless, it is not clear that with such a 
simple set of heuristics we could work successfully with objects 
as complicated as a human face, a blob of falling water, an 
amoeba, the surface of the sea (?). 


At some point, we have to know what we want ^ the C0Bplexlty 

increases, the concept of "body" depends less and less In geometrical 
properties (disposition of edges, vertices, and more and more 

on purpose (Is a skeleton an object? Or perhaps the femur bone alone? 
The answer varies with our intention -- with the context) . 

Thus, models are necessary again. 

See also 'Do not use over-specialized assumptions. . page 252. 
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APPENDIX TO SECTION ON CURVE OBJECTS 


This appendix may be omitted In a first reading. 


Requirements for the preprocessor ^ preprocessor that feeda data 


to SEE has to find only: 

1. The lines of the scene. 

2. The vertices. 

3. The local slopes at each vertex. 

4. See also comments to figure R17. 

5. Illegal scenes (page 2.17) should be detected by the preprocessor. 


REQUIREMENTS 
FOR THE 
PREPROCESSOR 


How^a^will_curved_objects_be objects 

where the curves edges are gently bent, SEE 
will work fairly well. The more an edge 
departs from its rectilinear equivalent, 
the worse SEE will work; T-Joints will be 
difficult to find, a PORK may transform 
into a 'T', etc. (I am talking about the 
current SEE, described in the listings). 



Additional information could be used 


So far, we are trying to iden¬ 


tify objects on the basis of form alone, i. e., geometrical considera¬ 
tions. This Is asking a machine to do more than a human being does. 
Ambiguous line drawings, such as ARCH, become inambiguous when we 
introduce shading, lighting, texture, color, etc. All of these pro¬ 
perties could be used by SEE. In fact, consider how easy it would be 
to identify bodies if each one of them is of different color (and we 
could sense that fact). 


Psycholo gical evidence Know j^ ed g e D f t h e algorithms used by human 
beings for shape continuation (page 188>) is relevant. We quote from 
Krech and Crutchfield {1958}: 
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Grouping by Good Form. Other things 

being equal, stimuli that form a good figure 
-will have a tendency to be grouped. This 
is a very general formulation intended to 
embrace a number of more specific variants 
of the theme, traditionally classified as fol¬ 
lows. 

i. Good continuation. The tendency for 
elements to go with others in such a way as 
to permit the continuation of a line, or a 
curve, or a movement, in the direction that 
has already been established (see Fig. 37 c). 

1. Symmetry. The favoring of that 
grouping which will lead to symmetrical 
or balanced whqles as against asymmetrical 
ones. 

3. Closure. The grouping of elements in 


such a way as to make for a more closed or 
more complete whole figure. 

4. Common fate. The favoring of the 
grouping of those elements that move or 
change in a common direction, as distin¬ 
guished from those having other directions 
of movement or change in the field. 

It seems plausible to consider that the 
percepts resulting from all of the above 
determinants would be such as to meet the 
criterion of a good figure, that is, one that 
tends to be more continuous, more sym¬ 
metrical, more closed, more unified. 

Now the reader will see that a difficulty 
with this general proposition regarding 
grouping centers on the crucial phrase 
“good figure.” How can we know which 


o « o 0 o « » 


0 


a o o o o o o 


b 





J1_TLTL 


FIG. 37. Examples of grouping. In a, the dots 
are perceived in vertical columns, owing to 
their greater spatial proximity in the vertical 
than in the horizontal direction. In b, with 
proximity equal, the rows are perceived as 
horizontal, owing to grouping by similarity. In 
c, the principle of good continuation results in 





seeing the upper figure as made up of the two 
parts shown to the left below, even though 
logically it might just as well be composed of 
the two parts shown to the right below, or in¬ 
deed of any number of other combinations of 
two or more parts. (Adapted from Wertheimer, 
I9J3-) 
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BOX 21 

How to Measure “Goodness” 


Attneave has made an ingenious experi¬ 
mental attack on the problem of measuring 
the “goodness” of a figure. The subject is 
given a sheet of graph paper composed of 
4100 tiny squares (jo rows by 80 columns). 
His task is to guess whether the color of 
each successive square is black, white, or 
gray. The experim e nt er has in mind what 
the co mp et e d figure will look like (fig. a). 



Without knowing what the completed 
figure will be, the subject stares by guessing 
the square in the lower left comer. When 
he haa correctly identified the color, he 
moves on to guess the neat square to the 
right. He continues this process to the end 
of die row and then starts on the left end 
of the Matt row above. In this manner he 
luccewivtly guesses each of die 


squares. 

On the average, Attneave’s subjects made 
only 15 to 20 wrong goespet for the entire 
figure. How was this po ss i b l e? The answer 
is dwt the figure was ddibemedy designed 
so that knowledge of ports of the figure 
was sdficitar to enable the subject to make 
fairly valid predictions about die tetnainder 
of the figure. Tbit was accomplished by 
making all die white squares contiguous 
with one another, and similarly the Mack 
and the gray squares. Moreover, the con¬ 


tours separating the white, black, and gray 
areas are simple and regular. Where the 
figure tapers, it taper* an a regular way. 
And it has symmetry; after exploring one 
side, it is easy to predict die other ride. 
Thus, the subject having discovered that the 

fiist few aqoeree are white continues to guess 

white, and he is correct until he hits the 
grey contour at the itth column. After one 
or two errors, he then co ntinu es to guess 
gray. On the next row above, he tends to 
repeat the pattern of the first. 

AM these factors of compactness, symme¬ 
try, good continuation, etc., are aspects of 
s^at is ioqdied by * “good figure.” Thus an 
objective measure of the “goodness” of a 
figure is the earn wfcb which die subject 
1^4 predict its total form from minimal 
information about a part. 

Other figures can be siaaMe ri y tewed. For 
exempts, figure k would peeve to be a lew 
“good” figure because the number of errors 
in gueming would be larger. 

Atmeave's particular method will not, of 
course, apply to aM kind* of figures or all 
]dmb of pt unpttsl urgm i sarin as. But it 
does detnonatrete that there ate ways in 
which “goodness" can be objectively deter¬ 
mined. ■ ; 



configuration of stimuli ft “better” than 
another? 

To escape from this difficulty, we need 
to have independent criteria of what is a 
good figure. Some approach can be mad# 
to this; for instance, in the case of “sym¬ 
metry” there are objective rules we can 
apply to determine the relative symmetry 
of various figures. The same is true of sim¬ 
ple cases of “closure.” (See Box 21 for a 
relevant experiment.) 


But we are fgr from being able to stare 
such criteria when we deal wkh die highly 
complex configurations of our normal per¬ 
ceptual experience. Past of the difficulty 
steins from the fact of individual differ¬ 
ences among perceivers. One man’s mess 
may be another man’s order. And tills may 
reflect the important role of learning and 
past experience in the genesis of “good 
figure." 
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ON OPTICAL ILLUSIONS 




_I \ll-m-jboa\ * IME. fr. MF, Ir. LL lOufa*-, nuk. It 

L, action of mockias, Hr. il k mtt . pp. Of 

dere to mock at, Ir. in- + tudrrt to play, ^ ^ 

mock — more at luwthous! 1 « ow > -<T 

t tba action oi daodvtn* b(l)ldw Mate or / ^ 

tact o! being intcHectuafly deceived or misled 
; MiaAPvnaHENSiUN (2) : an instance of suck 

ss&S'.tt.liu wfsaaux 

deceives or ntiateads intell ec tually b (1) : per¬ 
ception of aomethlng objectively exl* fa* 

In snob a nary as to causa wtiti n t a rprn a rt en 
of ha actual nature (2) t hauucmatvn 1 
(3) : a pattern capable of r av en ibj to per¬ 
spective 1 : a fine ptoia aranspamt bob* 
binet or tulle usu. made of sUk and used for 
veils, trimmings, and dresses sys see delu¬ 
sion — Ift-lp*st 0 ti>a 4 \- 1 i*h-*wl, 
adj— fl-ln*uon>ary \D- , U-ifr>-,nef-f\ mttj 
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Given the nature of SEE, we will restrict the meaning of 'optical 
illusion 1 to illusions formed by solids, that is, ambiguities or 
inconsistencies whan we (or the. program SEE) try to find 3-dim bodies 
in a scene; thus, the Muller-Lyer Illusion ("A" in tha topmost figure) 
is not considered. 

Three binds of illualtfly According to this, we may elementarily 
classify the "scenes that ere unlikely, to .occur" (that is, those 
that are not "standard" or "normal") in fckraa types: 

“ Possible but no "good" interpretation. 

— Ambiguous — several good interpretations. 

=•= Impossible: without interpretation. 

Like POLZBRICK {Gusman}, SEE is not aspsclfiCally designed to 
handle optical illusions. It was primarily designed to analyze "real 
world" scenes; hence, an input scene that produces an illusion (in 
a human) is not likely to occur as input to SEE, Nevertheless, in 
the same way that we may overtest a program for square roots by asking 
for the square root of 'APPLE', Vf5"\ we may test SEE with some 
ambiguous scenes. Let us see what happens. 

POSSIBLE BUT NO "GOOD" INTERPRETATION . 

—■—■—^—— mmmmm Some objects do not 'make sense’ 

because they violate rules that most objects obey. Nevertheless, it 
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ACTUAL IMPOSSIBLE TRIANGLE was constructed by the author and his colleagues. 
The only requirement is that it be viewed with one eye (or photographed) from exactly 
the right position. The top photograph shows that two arms do not actually meet. When 
viewed in a certain way (bottom), they seem to come together and the illusion is complete. 

CFrom Gregory). 
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One of the strong rules used by humans is that objects whose pic¬ 
tures show straight lines have indeed straight edges; another strong 
rule is to assume the corners to be like the corners of a cube (faces 
meeting at right angles) . Under these rules, the above triangle 
does not make sense and people will classify it as an "impossible" 
object ( 'VARIANT’wlH be an "Impossible" object; Penrose's Triangle 
will be "3 sticks forming ah Impossible configuration or scene; 
"mounted in a funny way"; can not be seen as representing a single 
object lying in space). For instance, Gregory (Scientific American) 
tries to explain that the triangle has a real 3-dlm object as origi¬ 
nator, by constructing a body consisting of three rectangular 
parallelepipeds ("bricks") joined at right angles, and then taking a 
picture from a special direction, so that the free ends a and b 
seem to touch: 




These rules (faces meet at right angles; straight lines mean 
straight edges) are deeply ingrained into people, but nature does not 
need to follow them always. The Penrose Triangle can be obtained by 
photographing a 3-dim triangle with curved edges and skewed corners, 
where each side touches the other two. 

SEE finds three objects in figure 'Penrose Triangle.' 
Other examples follow. 



Figure 'BLACK' 

People assume that faces meet at 
right angles, and this object 
violates that rule, making it 
"impossible" or odd-looking. 
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It is possible to construct object 'BLACK* with planar faces. See 
figure 'TEST OBJECTS' page Z09. SEE finds one body in 'BLACK'. 


The object at right looks 


impossible if we assume all 
faces to be flat. If face aeb 
is curved, object is plausible 
R is its reflection on mirror 
M, and ^ a smoother version 
of R. $ looks "normal"; by 
deforming 6Q we could obtain R. 

Unlike humans, SEE does not 
hold these "very common rules" 
as inviolable; SEE does not 
have any special problems with 
these "strange but true" 
objects. 



A misleading suggestion of 
superiority should not be concluded 
from these rare cases; in other 
situations SEE makes mistakes 
that a human being does not 
(see figure 'SPREAD'). 



A 


Of course, SEE holds its own 


rules (for example, those of 

table 'Global Evidence') as inviolable; hence, given a "rare enough 

scene" it will make mistakes (cf. assertion in page S'* , after the 

Theorem). This is a similarity of behavior, I think, between people 

and SEE — each one follows rather rigidly a small set of rules. 

(see also conclusion at end of section). 
Besides, often humans will see the 'impossible' object as an 

object, doing SEE's job just as well. 
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SEE will generally give one of the possible answers, although 
not necessarily the one preferred by humans. In this example, SEE 
chose ( B ). 

The following scene, locally ambiguous, is correctly parsed by 
our program. 



Sometimes, the conservatism of SEE and its partial 
insufficiency to make very global judgements will leave a body 
unconnected; for instance, the three faces of one cube below will 
be reported each one as a separate object, due to insufficient 


links 




IMPOSSIBLE: WITHOUT INTERPRETATION 

. — " ' —— i Images that can not be product 


of photographing (projecting) a 3-dim scene. These objects do not 


have physical existence. 


This scene is without 
interpretation, meaning 
no 3-dim scene (with 3-dim 
bodies) could have 
produced it. 



In figures like the above one, men are unaware of the extension 
of the background, and makes sense even If B is back¬ 

ground. SEE is unable to make this mistake, and its analysis of 
the scene will reflect the fact: the preprocessor will complain that 
one region, the background, is neighbor of itself. See comments to 
scene R3, page 113. 

Of course, in these cases there is no answer to the question 
"which are the bodies in the scene?" Whatever answer SEE (or anybody 
else) gives, it is wrong. 

Nevertheless, according to our meta*theorem (page 33), there is 
an extremely easy way to discover and reject these imposible scenes: 
all of them are necessarily illegal scenes (q.v., page 217). And we know 
how to detect illfgal scenes. SEE (or its preprocessor, rather) already does that. 

SEE detects all impossible scenes, by refusing the data as an 
illegal scene. 
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A PROGRAM TO DISCOVER HUMAN OPTICAL ILLUSIONS 

Some scenes get classified by our metatheorem as 'possible but 
not "good" interpretation', and likewise by SEE, which does not refuse 
to analyze any legal scene. 

Nevertheless, a person will stubbornly classify them as 'odd¬ 
looking' or 'not making sense' or 'impossible', even if we teach him 
the solution obtained by SEE (figures 'Penrose Triangle', 'Black', 

1 Staircase', 'CONTRADICTOR?'). 



Figure 'CONTRADICTOR?' 

One object is found by SEE: (si ;2 :3 :4). 
As such (since it is a legal scene), SEE 
classifies it as 'possible but not "good" 
interpretation'. A person will classify 
it as "hot making 3-dim sense": a. human 
optical illusion. Is it possible to 
reconcile these views? 


Of course, the metatheorem (page ^9 ) insures that there is at 
least one solution, so SEE's interpretation is "right" (it has chosen 
one correct answer, generally not the trivial solution given by the 
metatheorem), and the mortal is wrong. Also, the theorem of page 50 
insures that any system (human or computer) that uses too "local" 
rules (see fig. 'MACHINE') will make at least one mistake, no matter 
what rules he (or it) uses. 
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H-optical illusions m , , . _ 

— 1 There is thus a disagreement between SEE and our 

fellow subject, because SEE has classified the scene as 'possible but 

no * good'interpretation' and our man has said 'contradictory as a three- 

dimensional scene'. Let us call these human optical illusions (such 

as 'Contradictory', 'Staircase', etc.) by the name hroptical illusions. 


What to do in these disagreements? Who is right? 

SEE is right Above comments seem to indicate that the electronic 
data-processor is correct. The human has used excesively "local" 
rules. That being the case, we can teach and train (if avoiding 
future errors is desirable) our subjects to "understand", racionalise 
and make sense out of these h-optical illusions. Indeed, that is what 
is tried in figures 'Black', 'Penrose Triangle', etc. Different 
people may show different degrees of (H-optical) illusion before 
training and after training (see Box). This training is possible 
(see Box). 

In other words, if SEE is right, the computer scientist has 
nothing to do, it is all up to the psychologists and educators. 

Man is right We My hold the view that the human answer is still 
preferable. Then, to our relief, man is right and SEE is wrong. 

It is necessary (perhaps) to modify and correct SEE, so as to emulate 
personal behavior. We suggest a way to do this. 


A program to discover h-optical illusions _ „ 

. i nn .. It is possible to enable 

SEE to detect these h-optical illusions, so that it will classify the legal 

sceneB into "possible" or "h-optical illusions." I cnraBi.™ I 

I SUGGESTION I 

As the problem of discriminating between background 
and objects (see section 'On background discrimination by Computer'), 
this is an interesting project from the "psychological" point of view 
but, as In the background case, it is not essential at the moment 
for our vision-robot work. 


Strictly, there is a third possibility: both are wrong. 
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BOX 


There is generally a wealth of available information though none entirely 
reliable—for settling the size and distance of external objects, with sufficient 
precision for normal use. As is well known, the visual system makes use of 
a host of ‘depth cues’, such as gradual loss of detailed texture with increasing 
distance, haziness due to the atmosphere and nearer objects partly hiding 
those more distant. These cues were discussed in the nineteenth century 
by the great von Helmholtz (1925), who fully realised their importance, and 
they have been the subject of many investigations since, especially by 
J. J. Gibson (1950). Whatever the richness of depth cues, however, the visual 
input is always ambiguous. Though the brain makes the best bet on the 
evidence—it may always be wrong. 

The kind of mistakes which occur when the bet is on the favourite though 
the favourite is not placed, is shown most dramatically by the demonstrations 
of Adelbert Ames (1946). The most impressive demonstration is given 
simply with a room which is non-rectangular, but so shaped that it gives the 
same retinal image as a rectangular room to an eye placed in a certain 
position. Now clearly this room, though queer shaped, must appear the 
same as a normal rectangular room, for it gives the same image to the eye. 
But consider what happens when objects are placed inside the Ames room. 
The further wall recedes at one side, so that an object or person standing in 
one corner is actually at a different distance than is a second object placed 
at the other far corner. These objects (or people) appear, however, to be 
at the same distance—and they are seen the wrong size. This is clear evidence 
that we assume rooms to be rectangular (because they usually are) and we 
interpret the size of objects according to their distance as given by this 
assumption. When the assumption is wrong we see wrongly. What Ames 
did was to rig the odds, and then we make the wrong decision on size and 
distance. A child may appear larger than a man. We may know this is 
absurd and yet continue to see a bizarre world. The retinal image is all 
right, but the odds have produced the wrong internal file cards and then the 
human seeing machine is upset, and gives a wrong answer. 

It is interesting that the Ames room is seen correctty by peoples, such as 
the Zulus, brought up in a ‘circular culture’ of beehive huts where there are 
few reliable perspective features, such as rectangular corners and parallel 
lines, in their visual environment. To the Zulus, the odds are not rigged by 
the Ames room—to them this is not misleading perspective. They are not 
subject to this illusion, but accept the room as the shape it is, and see the 
objects in it correctly in distance and size. This is a matter of very real 
importance. It shows that when we are transferred to an alien or bizarre 
environment, where our filing cards are inappropriate, we interpret the 
images in the eyes according to principles found reliable in the previous, 
familiar world—but now they may systematically mislead and then percep¬ 
tion goes wrong. Space travellers beware! {Gregory, in {Collins 

and Michie}} 
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A possible way to attack the problem i* 

(1) To identify each link with whoever proposed it. 

(2) To set up systems of simultaneous "symbolic" equations. 

(3) To solve them by a limination. 

We elaborate: 

(1) Mark each link with the name of the heuristic that produces it. 
After obtaining the 'maximal' nuclei by GLOBAL and LOCAL, seve 
ral links are left (for example, three in fig. 'FINAL-BRIDGE') 
and ignored by the current SEE. Instead, one could see what 
kind of links they are, and one has in this way more informa¬ 
tion about the type of contradictions in the scene. 

(2) Introduce a 'conditional' link: regions :1 and :2 belong to 
the same body if region :3 does not. An OR link is now possi¬ 
ble by use of the conditional, since aaOb -Hi- bV^a. 

(2.3) Introduce a 'NOT' link: :3 :5, regions :3 and :5 do not 

belong to the same body. 

(2.6) As in ordinary algebraic equations, a system of n simulta¬ 
neous equations means that all of them must be satisfied; 
the "AND" of all must be true. Thus, AND is implicit in our 
notation. So far, we have OR, AND, NOT, IMPLIES (conditional): 
we have more than necessary. 

At the end, we have a system of simultaneous equations 
like these, where :1 = :2 means both belong to same body; this 
is an equivalence relation so I use the - sign: 

:1 « :2 OR :3 * :5 

:3 : 2 - 4 > :1 - :4 < E > 


I 

We now procede to "solve" these equations. Three things could happen: 

““ Exactly one solution is found. This is the normal case, and 

that solution tells what the bodies are. Familiar, "clear", possible 
scenes will fall in this case. 

== More than one solution is found consistent with our equations. 
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All are reported. This is the case "Ambiguous — several good 
interpretations." 

No solution is found. This is a genuine ^optical illusion, 

corresponding to a contradiction in' the equations. For instance, in 
fig. 'CONTRADICTORY', equations set by the T-joints between :2 and 
:3 would be inconsistent with those set by the Arrows and Forks. 


How to solve the equat ions (E) by the solution to (E) we mean a division of 

the scene <:1, :2.:n) by means of a partition of the form 

(:1 = :5 - :7 - :6), 

(:3 - :2), 

(: 4 ) 

which is consistent with (E). 

In the current SEE, 

(a) The equations are only equalities: :1 =• :2. 

Also, equations of the type :1 ^ :2 are taken into 
account by inhibitory mechanisms, such as NOSABO. 


No conditional links exist. 

(b) Since all equations are of the type :2 - :3, the solu¬ 
tion is obtained by applying transitivity, that is, 

1*2 ^ parentheses 

j = -j ^ (1 = 2-3) indicate nuclei. 



Except that we require two antecedents for application 
of transitivity (two strong links): 

1-2 
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An exhaustive search (which successively tests each possible parti¬ 
tion) of the solution to (E) is impractical except in very small 
scenes, and heuristic methods are needed. 

I suggest to start from the equalities such as 1=2 

2 = 3 

and to form nuclei a with the current SEE, except that at each step 
we check to see if our current nuclei satisfy all of (E); for 
disjunctive equations such as " 4=5 OR 6^7 OR 4=6" 

we try each branch of the OR in turn, rejecting those who conduce to 
no solution (this may be pretty combinatorial, too). 

Perhaps it is possible to use more Logic here — some sort of 
theorem proving. 

Conclusions and conjectures , ,, . . 

1 • The similarities between SEE and people 

(see also 'Human perception vs. computer perception, page254) stem 
from the fact that, like SEE, people seem to use only a small number 
of rules (although not necessarily those used by SEE), which work in 
almost all cases, but when these rules conduct to an ambiguity or 
inconsistency ("conflicts"), there is reticence to abandon them, and 
mistakes or impossibilities are produced. 

It is possible that, like SEE, people use primarily local clues, 
and with less frequency more global information to disambiguate 
interpretations. I think that, in the presence of objects (in 2-dim 
line drawings, such as 'MOMO', for instance) not seen before, humans 
follow general rules not unlike those used by SEE to distinguish 
or decompose a scene into bodies. Rules that apply to all polyhedra 
have to be invoked, since in presence of previously unseen objects, 
humans can not use a model of the object. 

The more familiar an object is (or if we have reason to suspect it 
or expect it), the faster we abandon the general rules and propose its 
model as a possible explanatinn of part of an scene; we then jump to 
a model matching routine (a la JDT {MAC TR 37}) that tries to fit the 
model to part of the scene (to a semi—isolated body); general rules 
a la SEE prevent us from overflowing with our model into other bodies, 
and help us to deal with partially occluded bodies. 
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ON NOISY INPUT 

The performance of our programs Is analysed when the data has 
imperfections consisting of (1) misplaced vertices, (2) missing 
edges, (3) spurious extra lines, (4) missing faces, (5) two vertices 
merged. 

The section 'Analysis of Many Scenes' contains results of SEE 
when applied to imperfect scenes. 

It is easy to predict the operation of SEE when the two- 
dimensional data supplied is clean , in the sense of being an accurate 
representation of the three-dimensional scene. 

In practice, of course, errors will occur in the data and it be¬ 
comes important to know how sensitive our program is to them. 

SEE has some serendipity. Many of the imperfectiona in the 
data do not cause mistakes in the linking procedure, or the link 
misplacements are not enough to cause erroneous identification. 

But mistakes are made. 

Here is how different types of imperfections are handled: 

"“The assignment of types to vertices is highly insensitive to errors 
in the position of each vertex, except T'S that become Forks of 
Arrows. Two cures to the exceptions were found, only the first 
of which is implemented: 

(1) Allow tolerances in concepts of parallelism and colinearity. 

(2) Allow a long but slightly twisted rectilinear segment to be 
"straightened", as indicated in comments on scene R17. 

== Missing edges are subdivided in three classes (discussed below); 
two of them produce recoverable or detectable errors (hence, 
susceptible of correction or prevention). It will be difficult to 
detect if a segment of the third class is missing; these will pro¬ 
duce recognition mistakes. 

ss Additional lines, like the ones caused by edges of shadows, are not 
easily detected as spurious or superfluous. Their presence mainly 
produces a diminution in the number of useful links, thus some- 
cimes causing too conservative behavior -- i.e., proposition of too 
many bodies. 

a= Whole faces may be missing. Ordinarily (see scenes L2, L9T). 
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the remaining part of the body gets correctly identified. 

OBTAINING THE DATA 

The scenes analysed by our program in this thesis were obtained 
by one of two methods t 

By free d rawing A ltne drawing representing three-dimensional objects 
was made; the coordinates of each vertex were accurately measured (or 
computed) and the information was put in the ‘Input Format' «oi» 
previously described. Also the regions belonging to the background 
were indicated as such. 

These scenes have mnemonic names such as TRIAL, BRIDGE, etc. 

What kind of prolection did you use ? Were these isometric drawings? 

Since no assumption is made on the rectilinear objects being drawn, 
the drawings are not isometric, or perspective, or ... projections. 

They could be any of them. It is not assumed that "we are dealing 
with prisms, with faces of a body meeting at right angles (like the 
corners of a cube),"°with convex objects. Neither the drawings nor 
the program make any assumption of this type. If the reader wishes 
to adopt the assumption specified above in quotation marks, then the 
drawings will correspond to orthogonal projections of three-dimensional 
scenes. 

Ho support hypothesis is needed: if necessary, the objects could 
be floating in a transparent fluid having their same density. 

Arbitrary but not too complicated objects were cut 
from pine wood, with flat surfaces, and painted black. Their edges 
were painted white. By placing them on a black table (see first few 
pictures of this thesis) in different positions and combinations, 
three-dimensional scenes were created (see figure 'TEST OBJECTS'). 
Pictures were taken with high contrast film slightly under-exposed 
so as to render black everything but the lines. Diffuse illumination 
eliminated shadows [.Great help was received in the pictorial task 
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Figure 'TEST OBJECTS 
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TEST OBJECTS' (Cont. 
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from Messrs. William H. Henneman, Devendra D. Mehta and David Waltz, 

and is here acknowledged]. The photographs were taken with a depression 
o o 

angle from 45 to 90 (that is, looking down), 50 mm focal length 
lens, 35 mm camera (standard equipment). 

The size of the prints is approx. by 11 inches (21.5 by 28 cm). 
If some lines were not clear, they were retouched with white ink. 

If some lines were missing , they were HOT added . 

The pictures have names like L2 or R3, a letter and a digit. 

Most of them are stereographic pairs, taken with both cameras having 
parallel optical axes, and the sensitive film on the same plane. 

SEE only analyzes one scene at the time, so the left picture is not 
consulted when SEE analyzes the right picture, and viceversa. 

A transparent mlllimetrlc mesh is laid on top of the prints, 
and the coordinates are read by «y« and put by hand, in the 'Input 
Format' form. The thickness of each line is about 1 mm (see figure 
'TEST OBJECTS'); typically, the size of a scene is 10 or 15 cmt a 
minimum error of J 1 per cent in the coordinates of a vertex is al¬ 
ready present. The slopes and directions of short segments suffer, 
naturally, much greater errors. Also, if two vertices are too close 
together (about two millimeters) they are merged and codified as one. 

We are simulating the kind of mistakes that are likely to occur. 

Also, some bias is Introduced, no doubt), by the human operators. 

[By reading the coordinates in most of the scenes, immense help was 
given by Miss Cornelia A. Sullivan and Mr. Devendra D. Mehta; the 
author acknowledges it.] 

Irrespective of the generation method, the scenes that appear in 
this thesis were drawn in their final form by the PDP-6 computer 
through a Calcomp plotter, and then inked and finished by hand . 

Thus, it is possible to perceive in many of them thd imperfections 
of the data that SEE had to analyze* 


210 




MISPLACED VERTICES 




The coordinates of a vertex may contain a small error or 'noise*. 
How does this affect the type of a vertex? Does the type change? 


L. 




Not affected 


PORK. 

ARROW 




Not affected 

Not affected 


K. 


X. 


K 

S' 


T. 



PEAK. 

MULTI. 




Transforms into MULTI. 



Transforms into MULTI. 

Transforms into ARROW 

Transforms into FORK. 

Not affected. 

Not affected. 


Many types are unaffected. Type K vertices transform into 
MUUI, but since K's are seldom used by SEE, this is no big loss. 

X's transform into MULTIs, and we lose two links here, which 
makes SEE to behave more conservatively. Also GOODT gets affected 
(though not much). 

The serious change are the T's that get transformed into ARROWS 
or PORKs, when these T's are matching T's. Because they are used 
for linking otherwise disconnected pieces of a body, their loss 
generally implies the partition of a body into two. See figure 
'DISCONNECTED'. 
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Figure 'DISCONNECTED' 

The T's under discussion are marked by 
small circles ( • ). In (a), the mis- 
classlfication of these T's into Arrows 
or Forks does not break the occluded 
body, who retains its unity thanks to 
i1• In (b), the same mis-classification 
does break the occluded body, reporting 
two objects instead of one, a possible 
but less desirable answer. If the T's 
are not matching T's, as in (c), their 
mis-classification does not matter. 


The loss of matching T's makes the program to be more conserva¬ 


tive in some cases. In some 
sense (see 'Desirability 
Criterion') this is tolera. 


What other perils does 
the misclassification of 
the T's bring? We should 
worry if, due to errors cau¬ 
sed by T's, the occluded 
bady joins the occluding 


DESIRABILITY CRITERION. 

(1) We would like a SEE that never makes 
mistakes. Sincethis is not possible, 
then 

(2) We would like it to make mistakes of 
only one kind, either join; two 
bodies that should be left separated 
(intrepid, cavalier behavior), or 
leave unattached two nuclei that 
should be reported as a single ob¬ 
ject (conservative behavior). 

(3) Among the two, we prefer a conserva¬ 

tive SEE, because its errors will 
be easier to correct (cf. Stereo 
Perception) ._ \_ 


The T's should not originate r~ -\ 

the reporting of :l-2-3 as \ 2. \ 

part of one body \___\ 

Each T, when perturbed, will go to one of these states: (N) normal, 
unperturbed; (L) "left", E 2 moves towards E x , Si-becoming 


a FORK, or (R) "right", when E^ moves away 
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from E^j El becoming an Arrow. 

For three T's of an occluded body, 3 3 = 27 states are possible. 
They are shown in next page, in table 'THREE Ts'. 

How many of these 27 states will produce 
mis-links joining 1 with 3 or 2 with 3 
or 1 with 4 or 2 with 4 (none of the four 
regions is necessarily background) ? 

None. 

The reason is that (see description of NOSABO) a T or an Arrow 
or an L inhibit the link shown below. 


so that (a) An arrow in position (I) [or (III)] suggests linking 1 
with 4. This link is inhibited by the L at IV [or VI]. 
Example: Figure R L 1 in Table 'THREE Ts'. (f*S* ' l,li 

(b) A Fork in position (I) [or (III)] suggests 

(i) linking 1 with 3. Inhibited because of the T or 
arrow in vertex II. 

(ii) linking 1 with 4. Inhibited because of the L in IV. 

(iii) linking 4 with 3. Depends on outside considerations. 
Discussed below. 

Example: L R L. 

(c) An Arrow in position (II) suggests linking 1 with 2. 
Inhibited or allowed according to vertex V. Example: RRL. 

(d) A Fork in position (II) suggests 

(i) linking 1 with 3. Link inhibited by the T or arrow 
of I. 

(ii) linking 2 with 3. Inhibited by the T or arrow in III. 

(iii) linking 1 with 2. Inhibited or allowed according to 
vertex V. 

Example: R L N. 

Thus, no link is possible, even under these "noisy" circumstances, 
between 1 and 3 or 2 and 3 or 1 and 4 or 2 with 4. That is, 
the 27 cases of table 'THREE Ts' are treated correctly. 
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A possibility of bad linking exists between 4 and 3 in this 
case, if two T's convert into forks and "help each other": 

Two links originate 
the joining of 4 
and 3. 

Rather than get involved in this sub-problem, we will point 
out two solutions to the misplaced vertices: (1) by allowing 3 ome 
tolerance in 'parallel' and 'collinear'; (2) by 'straightening out' 
crooked or twisted segments. We explain. 



(definition) a is equal within epsilon to b, 
written a=b, iff \ a - b | < | . Generally, € > 0. 


Tolerances in collinearlty and parallelism Two llnea are para ii e l if 

Woo 

the sine of the angle formed by them is smaller than SINTO. (sin*—°) 


Currently, SINTO “ 0.15 ^ .- -- - 

Lines ab and be are colinear if b 

length ab + length be length ac. Currently, COLTO =0.05 

We have implemented these definitions. Better definitions exist. 
These definitions allow most small inaccuracies in the coordinates 
of vertices to pass unnoticed. Although they are giving reasonable 
service, they are only temporary, since by relaxing too much the 
criterion for parallelism and collinearity, strange things could 
happen (fig- 'CROSSED'). j 



Fig. 'CROSSED' 


A too lenient definition of parallel 
and collinear could give the follo¬ 
wing matching T's: a to d, b to f, 
c to e. 


See also on section 'Analysis of many scenes' 


comments to L9 andR9T. 

15 1 , I5£). 








Straightening twisted segments 


The definitive cure is simple: 


reassign the slope of be to be that of ad, if be is small, ad large 

CL 

/€ 


and the angles at b and c are close to 180°. See also comments to 
figure R17. This has not been implemented. In this way, all cases of 
table 'THREE Ts 1 will be solved. See also comments to scene R4. 

Probably the preprocessor will automatically take care of this 
rectification, since it may prefer to give a long segment ad instead 
of three almost collinear shorter segments ab, be, cd. 

Since the straightening of a segment replaces some known vertices 
(which we suppose inaccurate) by other idealized vertices, we may be 
introducing uncertainty, in the form of non verified hypotheses, to our 
data. The object in the scene could really be "crooked" or twisted. 


■> 




Fig. 'TWISTED' 

The object to the left is really bent as shown. 
If we idealize it as in the right, we are falsi 
fying the information about it. 


By replacing it by an idealized version, we may be creating 
problems for its identification, when we want to assign a name to it. 
But notice that the 'unfcent' version or idealization is handier for 
SEE. 


If the information is very bad _ ... 

—" Throw it away and read the scene 

again. A simile indicates that the issue becomes one of allocation 

of resources: if you receive a written message containing a few 

wrong characters and missing words, you may use your brains and time 
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to deduce the omitted portions (by employing the redundancy, for in¬ 
stance). If the dispatch is very garbled, you might as well request 
a new one. 

Summary ^ known how to handle small inaccuracies in the position 
of the vertices. 


MISSING EDGES 


From time to time, an edge will fail to show up in the scene, 
and the questions are (1) how much harm will be produced, and (2) 
how can we detect and correct the anomaly. An example appears in 


page 141. 
Illegal Scenes 


Lines that end abruptly produce illegal inputs. 


suggesting that segments are missing. 



In (a), a vertex has one edge. 

In (b), the network can be separated by erasing 
just one edge. 

Both are illegal scenes, indicating missing or 
extra lines. 


Also (Figure 'ILLEGAL', (b)) a region can not be a neighbor of 
itself -- another irregularity that points to deficient data. Cf. 
comments to scene R3. l, 3)« 

These constraints can be nicely exploited by a preprocessor. 


Line proposer and line verifier A Une proposer la a program that 

suggests places where a line can be missing; a line verifier is es¬ 
sentially a precise line finder that searches a line in only a small 
portion of the scene, as told by the line proposer. 
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In the body of this section we will develop several heuristics for 
use in a line proposer. The verifier is not discussed. 

Blum's line proposer , . 

— .. An algorithm has been designed by Manuel Blum 

{1968}, that will detect many places where lines are possibly missing. 
It suspects concave regions. An angle bigger than 180 originates a 
search for the omittedline in directions parallel to the neighbor 



Region il is suspected to contain undetected lines, 
because it is concave. Vertex v is chosen becau¬ 
se its internal angle is bigger than 180 degrees. 

From it, Blum's proposer will suggest to the line 
verifier to look for lines in directions VA' and 
VB' (broken lines), parallel to the neighbor edges 
A and B. It also searches (dotted lines) along 
the continuation to lines C and D. 

edges (fig. 'BLUM'). It also originates searches along its own 

edges. In other conditions, a vertical line is searched. 

No harm is done by a bad proposer. Only some time is wasted. 


Internal edges , 

- .- ...-I— If a missing line J .s totally Internal to a body, and 

Is not detected by the line proposer, its absence will at most cause 
conservative behavior in SEE. In some cases their absence does not 
confuse SEE (figure 'MISSING'). 

The majority of Internal edges cause concave regions to appear 
(fig. 'BLUM'). They will be detected by a line proposer. 
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Fig. 'M I S 5 IK C' 


Cases where the disappearance of an internal 
line (dotted) does not separate the body. 

In (a), the object separates into two. 
This case is recognized by Blum's heuristics. 
Else, SEE could check for this configuration 
as a special case. 


Edges that separate two bodies are called external. 

If undetected, their disappearance will cause 'intrepid' errors by 
SEE, which are undesirable (see 'Desirability criterion' in page 212). 
Two cases result: (1) Only part of the edge disappears; there is possi¬ 
bility of correction. (2) The whole edge is both external and missing 
(and the scene is still 'legal'): a mistake will occur, See figure 
•External Edges'. 

Case (1) Only part of an external edge disappears. It can be 
detected because 

(a) a concave region is generated, and 

(b) the region has internal angles big 
ger than 180° where a line "goes 
through"! ab is colinear with cd. 
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( 1 ) 


Figure 'EXTERNAL EDGES' 

A segment separating two bodies may disappear 
CD If that segment is part of a larger segment, 

to sense and correct the anomaly, 
u; XI a whole external edge is missing, its 

absence remains undetected, inducing a mistake 
in SEE. In (i) an external edge disappears, and 
creates an illegal figure. 

Case (2) The complete edge is missing. Then (b) of case 1 fail 
and detection is difficult. 


SPURIOUS EXTRA LINES 

They are lines that "should not be there", such as those 
caused by edges of shadows. 


Fig. 'LIGHT AND SHADOW 
Each body becomes two; each one is recognized 
independently by SEE. Four bodies are found. 
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Shadows of rectilinear objects travel in planes that (in theory) 
part an object in two (or more) : the illuminated part, and the dark 
one. Each is a separate object by itself, according to our definition 
(see 'Several definitions of a body'), since they have plane boundaries. 
SEE should recognize them. 

In practice, we have not tried our program with scenes having 
lines produced by shadows. A conservative behavior, like in figure 
'LIGHT AND SHADOW 1 , is expected. 

Some shadows gradually diffuse; multiple lights cause multiple 
shadows. These problems may have to be solved by assuming or compu¬ 
ting the direction or position of the light sources. 


MERGED VERTICES 


Two vertices fused in one will produce diminution in the num¬ 
ber of useful links they report, since the resulting vertex will 
be of type MULTI. Thus, conservative behavior is expected from SEE 
in these cases (see Fig. L19, L17T, RI7, L4, etc. The program does 

well in them, when not too many coincidences are present). _ 

It Is possible to analyze the vertices of type 


SUGGESTION 


MULTI and try to decompose them in simpler types (conpare figure 
R19 with WRIST*). Read comments to R19 and L19. 


CONCLUSION 

On scenes obtained from "real world" data, inaccuracies are 
expected, and it is required of SEE to work well despite them. 
Currently, the behavior of the program In these cases is not 
discouraging, but is not extremely satisfactory, either. The 
additional work needed depends heavily on obtaining genuine 
test data, instead of the faked data used in the experiments 
described. 
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BACKGROUND DISCRIMINATION BY COMPUTER 


A program determines the regions that belong to the background 
of a given scene; that is, the regions that are not members of any 
of the bodies. Examples are given. 

Need 

—— The program SEE requires to know which regions of the scene 
belong to the background (cf. 'SEE, a program that finds bodies in 
a scene'). At present, this information is supplied by the user, 
as described in section 'Internal format' (page £1 ) and 'Input 
Format' (page ) of a scene. 

In the current vision experiments, it is not difficult to 
determine the regions that form the background, since they are always 
black and homogeneous (see first few pictures in this thesis). But 
in more realistic scenes, there will be a great demand for a background 
finding program. 

Therefore, it is interesting to try to 
develop a program to separate the "ground" 
in the back from the objects in the 
"foreground", having a limited information 
consisting of the scene as described in 
section 'Internal Format', namely, vertices 
and edges. 

That is, we will use in this task only 
"geometric" properties. 

Such program has been written, and works automatically under 
the command of PREPARA, the function that converts a scene from its 
Input Format' to its 'Internal Format'. When the regions forming 
the background are not supplied, PREPARA activates our program, 
named BACKGROUND, and these regions are searched for; otherwise, 

SEE is supplied with the background regions as declared in 'Input 
Format'. 
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Example. Scene 'HARD'. The results obtained are 


(SUSPICIOUS aR£ NIL) 

the background of 
C34 *36 ! 35 ) 

(:34 *36 *35) 


naRD 15 



Three regions are found to be part of the background: :34, *36, 
and :35« That Is correct. 

We now proceed to describe the subroutines that make such 
Identification possible. 

Suspicious Jn a flrst pa88> we collect the regions that "may be" 
background, and call them "suspicious regions". Regions that are 

not suspicious are LIMPIO (clean). 

Ideally, if a region :R contains L's, FORKs, ARROWS or T s in 
the position below, it is not a part of the background. 



(I) (ID (lit) (iv) 


FIGURE 'BACKGROUND' 

In an Idealised situation, :R can not be part of the 
background! it is clean , or free of suspiciousness. 
iR will be called 'LIMPIO' (clean). 
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(I) means that the background [almost] never ±js the internal 

part of an 'L' (the region containing the angle smaller than 
180 degrees). 

(II) means that the background does not contain FORKs. 

(III) means that the background is not in the "inside" of an ARROW 
(the background is not a 1 proper'arrow'). 

(IV) means that the background can not be the flat region of a 'T'; 
this in turn means that a body can not disappear under the back 
ground and then reappear at some other point: 

:3 


:3 is not the background. 

We reinterprete rules (I)-(IV) as follows: 

(I) A region "inside" an L is LIMPIO (clean). 

(II) A region containing a fork is LIMPIO. 

(III) A region "inside" an arrow is LIMPIO. 

(IV) A region "on the flat side" of a T is LIMPIO. 

Clean Vertex (definition). A vertex is clean with respect to a re¬ 
gion if it indicates, through rules I-IV, that such region is LIMPIO. 
For instance, K is clean for :1 and for :2, 
since (III) indicates that :1 and :2 are LIM¬ 
PIO. K is not clean for :3. 

These heuristics are not 100 per cent infallible; also, in a 
moderately complicated scene, coincidences of vertices are bound to 
occur, originating violations to I-IV. For instance, in figure CORN 
(page 150), vertex UU is a Fork belonging to the background, in con¬ 
tradiction with (II). 
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For completeness, we present a violation to each one of rules I-IVj 



FIGURE 'VIOLATIONS' 

jl is the background. In all four cases, 
vertex V violates rule specified at the 
bottom of figure. They are rare cases. 

The situation indicates that rules I-IV 
provide noisy information, which has to 
be dealt with carefully. That is what is done. 

The vertices of each region are analyzed under rules (l)-(IV). 
To allow for coincidences of vertices and rare cases (like those in 
figure 'VIOLATIONS'), it is permitted for a suspicious region to 
have a small number of clean vertices. 

The number of clean vertices is compared with a quantity that 
is a small fraction of L (the number of vertices on the boundary); 
currently, that fraction is L/9. 

== If the number of clean vertices, that is, vertices satisfying 
I-IV is bigger than L/9, we call that region LIMPIO ("clean"). 
In addition, (a) If L is large (bigger than 25, currently), 
that region is BIGFACE, such as s21 of 
scene L19 (page 144); 

(b) Otherwise, it is only LIMPIO (normal case). 

■== If it is not bigger than L/9, then it is SUSPICIOUS. Also, 

(a) If L is large (bigger than 25), the region 
is BACKGROUND, 

(b) Otherwise is only SUSPICIOUS (normal case). 


225 






That is, a region LIHFIO has to have at least 
1 + [one vertex of each nine] 

"clean" vertices. 

Example. Region :3 has four 'clean' 
vertices (four vertices indicate that :3 
is LIMPIO) - It can not be SUSPICIOUS. 


c 




(This scene is correctly analysed by SEE) 
All the three vertices of jl are not clean] 
tl will become Suspicious (a candidate for 
background). Five of the seven vertices of 
:2 are clean, so :2 is LIMPIO. Note that 
vertex C 1 is clean for >2 and not clean 
for *1. 


For example, when we apply the function SUSPICIOUS (see listings) 
to every region of scene SPREAD, the suspicious regions turn out to bet 
Suspicious only: t35 :18 t34 :2 :3 :12 til :33 t37 

t47 t48 r46. 

Background: :48. 


By analysis of its vertices, each region is either LIMPIO or 
SUSPICIOUS. The suspicious regions with more than 25 vertices are 
classified right away as BACKGROUND: a suspicious region with many 
edges is probably background. 

The selection is done entirely using "local" properties: a 
region is classified according to Information supplied exclusively 
by its own vertices. 


226 





FIGURE 'SPREAD 1 


Each region iselaaeified ea LIMPIO, 

SUSPICIOUS or BACRGBOQHD. 

More global Indication, ^ t » to ^ c l<le which of the eu.pl- 

clous regions are LIMPID, and whlcbones txe BMEfiVOffi; 

“ Since two background region, can not be contiguous ( the back¬ 
ground can not be neighbor of itself), suableldus region, tbit 
are contiguous with the background are cleaned and put in the 
LIMPIO status. 

In our exanple, >48 is background and therefore its sus¬ 
picious neighbor :18 gets cleaned and'becomes tQiPIO. 

“ Links are established through the witching T's. Ve call them 
b-iinks. 

Ideally, a suspicious regionlinked to a LIMPIO region 
gets cleaned, a suspicious region blinked to the background gets 
converted to background too. 
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Idealizing, suspicious region :1 
becomes LIMPIO, and suspicious 
region :2 becomes background. 

A more complicated procedure is 
actually used. 


In practice, we allow for small errors as follows: 


For each suspicious region, we notice if it is b*linked 

to background (BA), suspicious (SO), or Llmpio (LI). 

BA =“ == If it isIrlinked to background regions, we 

change it to Background, except if it has a 
background as neighbor, in which case we do 
nothing and continue. 

() SO LI If not blinked to background, butIrlinked both 
to Suspicious and Limplo regions, 

(1) If LI < SO, continue, do nothing. 

(2) If LI ^ SO, classify this region as 

llmpio (LI is the number 
of LIMPIO regions b*linked 
to the current region un¬ 
der consideration). 

() SO () If blinked only to suspicious, continue, do 
nothing. 

() () LI If (dlnked only to Llmpio, change it to LUnpIo. 

Note: Sometimes I write Limplo, sometimes LIMPIO, 
they mean the same. 

() () () If not blinked, continue, do nothing. 

We keep applying these rules until no change is observed. In 
this way, we have eliminated several suspicious regions. 

In SPREAD, the suspicious regions were 35, 18, 34, 2, 3, 


12, 11, 33, 37, 47, 48, 46. :48 is known to be the background 

(that was done in page ire) > soit is no longer suspicious. il8 
is a neighbor of the background (:48), and got cleaned In the 


page before this one. 

:11 is blinked with the LIMPIO :9 and with the suspicious :3. 
Therefore, ill changes to LIMPIO. 

:3 is blinked with the Llmpio :11, so the suspicious :3 be¬ 
comes Llmpio. 

:12 is blinked to the Limplo :10, and gets cleaned. 
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:46 is b*linked to the background :48, and gets made 
background, since :46 is not, at this moment, a neighbor of 
background. 

:34 is blinked to the background :48, and gets made 
background, since : 34 is not a neighbor of background. 

1 37 is blinked to the ilMPIO region :4, and transforms 
into LIMPIO. 

*35 is blinked to the region :34, which is background, 
so that the suspicious region 1 35 becomes background instead 

j 2 is a suspicious region blinked to the region :35, which 
is part of the background. According to our rules, :2 becomes 
part of the background, -z «&» tiimceJ 4 -e*t .^g. 

At the end, only regions :33 and :47 remain suspicious: 

(SUSPICIOUS ARK (133 |47)) 

== We collect all these 'stubborn 1 suspicious regions and label 

them background, except those which are neighbors of background. 
A better procedure may be to make the exception in I SUGGESTION | 
those regions that are neighbors of suspicious re¬ 
gions. That is, two neighboring suspicious regions prevent 
each other from becoming background. I have not explored 
this possibility. 

In the example SPREAD, :33 and :47 are made background. 

■*» If no region is background at this point, make ene of the "big- 
faces" background. There Is room here for improvement. 

“ If no background yet, make background the region with most 
vertices. This is not yet implemented. 

In our example, the (final) background regions are: 

:33 :47 :35 :34 :2 :48 :46. <— BACKGROUND OF 'SPREAD'. 
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Other examples of background finding 


Scene CORN 


LlL V4 
Fuijr 


I u* 

f I U 

M M 7 t S 

i\ c: A I c 


b c * « C - l .•< G f 0 ^ b * Cm.? CmJ 
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i H t enin :? ... ? l. 0 r 
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Scene BRIDGE 


( >30 I b b i G r A C hi ) 

( $ u $ P 1 C 1 C U f> « nf r M 1 L i 

I n*c. d nLin.1 u■ ?u i. 1 r n:ii. .?c [ 

( S 30 > 

( 2 30 I 
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Scene MOMO 


One mistake ( :31) is produced here 


KuuK 

S l u r V c! »c.« A i o ^ 

T Y H c. j c iN t A A i JK 
M » Its 
' c < I £ 

ScakC ilivii fuf Tttf bKU J : vuS u a nn'-c 
( o t S r 1 1 i_. i o b S Ah’ 1 • 3 i ) ! 

I l -i£ r. A Cft G -i 0o 'i C C“ iJ ,, '0 1 h 

( i 6 • 3 I S <*3 ) 



FIGURE —‘MOMO.’ 
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The problem Is ambiguous Like in the cage o£ body l8olatlon (BeC tion 

'The Concept of a Body'), the problem of determining the regions that 
belong to the background of a scene (regions that belong to no body) 
is ambiguous; many solutions are possible, as long as no two back¬ 
ground regions are contiguous. 

Among the multitude of solutions there exists a preferred one, 
which is "the" standard (common, familiar) interpretation chosen 
by people. 

Our program tries to choose also, among the many solutions, 

the standard one. 

Summary 

■ A lenient algorithm finds regions (by analyzing the types of 

their vertices, and their neighborhood relations) that may possibly 
be background, and labels them "SUSPICIOUS". With the idea of 
re-classifying the suspicious regions as 'LIMPIO' (clean, no back¬ 
ground) or 'BACKGROUND', a system of b*links is Introduced. These 
b*links provide more global Information about the scene. 

Members of the suspicious set are asslgnad to one of the other 
two sets (tsfio^bnisW^^hiie the algorithm tries to siinimize the b*links 
between Background and Llmpio regions. 


Conclusion 


Fair results are obtained with the algorithm just 


described. Sometimes, regions are obtained as Background that 
are genuine components of a body ("Limpio") and vice versa. 

Refinements are needed, but since in our present vision experi¬ 
ments the background is a homogeneous black area (see first few pic¬ 
tures of this thesis), no emphasis is shown right now. 
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STEREO PERCEPTION 


Summary Sq far we have discussed the identification of objects in a 
scene and ignored the problem of locating them in a three-dimensional 
space. 

There are several ways to achieve this. We will discuss here one 
of them: the use of more than one view of the same scene. 

A natural first step is to establish the correspondence between 
points in the two views; that is, given a point in one scene (left), 
to find the corresponding point in the other scene (right). Theorems 
S-l below and S-2 on page 
234 express criteria 
for this "stereo matching". 

SEE can independen¬ 
tly decompose the left 
and right scene into the 
bodies forming them,leav¬ 
ing as a problem to de¬ 
termine which of the ob¬ 
jects in the right scene 
corresponds to an object 
in the left scene. This can be done because each object will appear 
in both views with the same maximum height and minimum height (highest 
and lowest values of the y-coordinate of points belonging to that 
object); comparisons are easily made by replacing the objects by 
"intervals" consisting of these two numbers. 

Further disambiguation can be achieved by the use of the function 
(WHERE X. Y X Y ), which determines the (x, y, z) 3-dim position 
of a point of which its two 2-dim locations (X^, Y ) and (X^, Y^) 
are known. {Griffith, AI Memo 143}. 


THEOREM S-l 

If both cameras are identical, their optical 
axes parallel and the films or sensiti¬ 
ve surfaces or retinas lie in the same 
plane, 

then a simple necessary condition for two 
image points, one in each retina, to 
have come from the same 3-dim point, 
is that both image points (left and 
right) have the same y-coor 
dinate, 

measured in the direction perpendicu¬ 
lar to the line joining the optical 
centers. 
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Figure 'POINTS' 

Given two images of the same scene, before 
we can proceed to situate it in 3-dim space, 
it is necessary to know which points of the 
left scene correspond to points of the right 
scene: we have to discover the genuine pairs 
in it, a small subset of the cartesian pro¬ 
duct (a, b, c, d) X (e, f, g, h). It is 
desirable to have an algorithm that avoids an 
exhaustive search on this product. 


Genuine Pair (definition). A pair of points (P., P_) produced by a 
r 1 " L R 

real 3-dim point of the scene in consideration. 

Theorem S-2 below gives conditions that a genuine pair must meet. 
A particularization will produce theorem S-l above. 

THEOREM S-2 The left ima * e P L * nd the rl 8 ht l®«ge P R of a point p” 

have associated with them a variable, computable from 

(X^, Y^) or from (X R , Y R ), that will acquire the same 

value on P and on P . It is invariant under change 
1 R 

of scene. 

For the case where the optical axes are parallel, 

this variable is simply the y-coordinate (Y_ ■ Y ) or 

la R 

height of the image. 

For the case where the optical axes meet, this 

variable is y, an angle that plane P -C -P-C -P makes 

la Ij r r 

with r » the plane containing the optical axes. 

Any monotonic function of y will be just as good, 
(cf. figure 'GENUINE PAIRS'). 

From the theorem, the algorithm (referred to in fig. 'POINTS') that 
we may use to establish correspondence between points in the two 
views is: 
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Compare only points with the same y 
(or the same y-coordlnate). 

Points with different y can not 
come from a genuine pair. 


For each body, the knowledge of the 3-dim location of a few of Its 
vertices will be sufficient to position that body In real space, 
achieving In this way the goal of this section. 

See Digression 1 in section 'The concept of a body' , for a 
different approach. 



Figure 'y-PARAMETRIZATION' 

From geometrical considerations and the coordinates of a 
point P^ In L, it Is possible to attach to the line A-P L 
an angle y. Sytmilartij, an angle Is obtained for lines of R. 
It can now be said that a genuine pair (P^, P R ) must 
have the" same y's for P^ and P R . 

y is a physical quantity, namely the angle that 
the plane passing by the image P L and the optical 
centers and C R makes with the "horizontal" plane f , 
<T contains the optical axes). Clearly, for P L and 
Pr to be produced by a point P in 3-dim space, the y 
°f Fj, must be equal to the y of Pj^. This is a necessary 
condition that is easy to check. 


A real point P of the scene produces a left image P (which has 

L 

a certain value of y) and a right image P R with the same value of y 
(figure 'y-PARAMETRIZATION'). 

Thus, given a point in one scene, we 
have to search for its genuine pairs 
in the other scene among the points 
with its same y. They will be found 
along an straight line through A or B. 

Parametrization of the scene is possible not only by using y; 
a monotonic function of y will do. 

For computational efficiency, it may be advisable to store the 
points of the scenes into arrays according to the value of their y's 
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When the optical axes are parallel _ . 

■■■■■■ ■— 1 - .i n—. . In this case, points A and B on 

Iine C L " C R ( ft 8* ‘Genuine Pairs') travel to infinity, and lines Pj-A 

and P -B become horizontal (parallel to C-C„). The situation looks 
K L R 

like 


R 

-9— 


ao. 

to. 

AO. 

io. 



-AO. 

-40. 


A genuine pair (P , P) will 
have the same y-coordinate for 
both of its elements (10.0 in 
this case). 


So that, given a left image point P , we have to search only Jfc 

Li J 

among the points of R with its same height, to find "the” P R that 

will make a genuine pair (P , P ). “ 

L £ c 

But several genuine pairs may be found. Because on each hori- I 
zontal line on R, many points may lie. 


USE OF SEE IN STEREO PERCEPTION 

We can use the invariance of the variable described in Theorem 
S”2 to locate objects in three dimensional space, from a pair of ste¬ 
reo views (we will suppose parallel axes; other case is similarly 
treated) as follows: 

(1) Make an analysis of the left scene with SEE, identifying the 
bodies. 

(2) Id. for right scene. 

(3) Reduce each body to an interval formed by two numbers, its 
maximum and minimum height, specifying "closed" if the absolute 
extremal of the body is known, "open" if not. 

In this way we reduce each scene to a set of intervals (see 
figure 'INTERVALS'). 
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Each body is reduced 
to an interval. 

(4) Use these intervals to select which left body will go with what 
right body. The answer is simple (because it is unique) even 
in moderately crowded scenes. 

It is simple to take into account the fact that an open 
end of an interval indicates that the interval can extend 
further at such end. 

Sources of difficulties are s 

(a) Two bodies have the same interval, meaning they have identical 
maximum heights and minimum heights. This is possible. 



Quite easy: reduce some faces to intervals and compare them. 

(b) A body is seen in left scene but not in right scene (figures 
L12, R12). 

(c) SEE partitions one body in two in one scene, but not in the 
other. 

The "open" and "close" indications will help here. 

Also, remember that we are using, when comparing these intervals, 
just a very small part of the total Information concerning each body. 
When the selection is narrowed down to two or three candidates 
["left-body 1 is either right-body 2 or right-body 5 "], one can use 
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(1) the WHERE function of Griffith (op cit), 

(2) as in (a) above, the intervals for each face of the 
objects, so as to chose as "genuine pair" those two 
objects with more agreement in the intervals of their 
faces; 

(3) perhaps a face of unusual shape is enough for discri¬ 
mination, if it appears both in left and right scenes, 
or the number of vertices below the center of gravity, 
or ... 


summary 

In summary, I should like to point out that, while much 
has been stated within the somewhat constricting frame¬ 
work of this article, much remains to be stated. Certain, but 
not all, important classes of presentations have been 
treated, and there remain horizons as yet unexplored. Con¬ 
ceivably, the author will attempt, ex nihilo nihil fit, to estab¬ 
lish a more general perspective in the course of a subse¬ 
quent article. (p* W, «}. ■ 

Also, the reader is referred to other 
articles on the same topic. 
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FIGURE "R 1 2" 

For the pair L12 - R12, caution 
should be exercised, because an 
hexagonal prism disappears from 
L12 and a brick appears in R12. 









Scene LlO - RIO 


SEE analyzes Independently (pages H5 and HE) the left 
and right scenes , obtaining the following bodies! 

(BODY 1, IS *5 11 14 *12) t.kv t SCENE (LlO) 

(BODY 2. IS 16 115 17 *11.*14) 

(BODY 3. IS IS 19 110 *3) 

(BUOY 4. is 12 >13) 

(BODY i. IS XI3 XI5 XI6 XI14) 

RIGHT SCENE (RIO) (BODY 2, IS Xil3 Xtl Xlll %I9 Xll5) 

(BODY 3. IS XIS X12 XUO) 

(BODY 4. IS XI4 X*7 XI12 > 

For each of the eight bodies, we compute its minimum height and its 
maximum height, obtaining the following intervals! 

LlO RIO 

*5 si I 4 112 —* [66,105) [67,154] — XI3 X!5 XI6 X!l4 

16 * 1 5 *7 *11 s 14 — [79,120] [ 78>119 ] xil3 XU Xlll Xl9 X:15 

IS 19 110 *3 —* [68,152] [65,103)— X*8 X*2 X*10 

*2 ll3 > [21,82) [22,82) x*4 XI7 X*12 

These intervals are compared (left with right), trying to find 
pairs with discrepancies between their values tolerably small [if the 
interval has an open end, differences can be larger]. For *L10 - RIO', 
these are 

[66,105) - [65,103) 

[79,120] - [78,119] 

[68,152] - [67,154] 

[21,82) = [22,82) 

that corresponds to the following identification of bodies: 

15 11 14 112 corresponds to %IQ X*2 X*10 
16 Sl5 :7 in H4 corresponds to Xll3 XU Xlll Xl9 Xll5 
IS 19 110 13 corresponds to X13 X15 Xi6 XI14 
12 113 corresponds to X14 Xi7 X112 

Once these correspondences between objects in the two images art 
found, the function (WHERE ...) {Griffith} will position these bodies 
in three-dimensional space, achieving our goal. 
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CONCLUSIONS 


LOOKING BEHIND 

When I started to work on these problems, the idea was to 
describe an object by using a model, and with this model in memory, 
to search the scene looking for sub-parts of it that would fit the 
description. 

This work ended (as far as this thesis is concerned) with a 
program that finds bodies without having a model of them. 

But that is good. 

We did not know at the beginning that this could be done. 


LOOKING AHEAD 


a. Suggestions for further work 

b. Comments 

c. Recommendations 

d. Summary 

e. Conclusions 

f. Evaluation 

g. Extensions and Implications 


All these matters are 
normally encountered 
grouped in a chapter 
at the end of the work 


I can only partially lump all these important matters in one 
final section; many times I cite them in context, that is, next to 
the figure or subject that evokes them, or with which they are most 
closely related. As a result, they are spread through the body of 
this dissertation. 


Also, 


(1) The box 


|suggestion] 


appears through this thesis near a 
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partially unsolved or partially formulated problem, and/or its 
partially outlined or partially new solution. 

(2) In page there is a list of such suggestion boxes. 

(3) The remaining portion of this section and, in general, the 
sections close to the end of this work, abound in statements 
of type v a.) through (g.). 

(4) I have tried to start each section with a brief , and end it with 
a summary or conclusion . 

(5) The section 'Introduction' (page 10 ) specifies the problems 
treated in this thesis, and the section 'Preliminary view of 
Scene Analysis’ (page |*f ) produces a general view of available 
methods. 
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General notation 


To put, remove, etc,, links, we 
may develop a notation that will look like 
(WHEN A (Y A) (B il C |3 D j2) 

D (K ( A F ..)) (A 13 E :4 F i2) 

THEN 

PUT LINK KIND 3 »3 ;4 

NO LINK :1 :2 ) 

"When A Is a vertex of type 'Y*, and 
D Is a vertex of type 'K', and 
A and D are joined as specified, 

then 

put a link of kind 3 between region :3 and :4, and 
do not put a link between :2 and :1," 

The general notation is 

(WHEN P E E') 

"when predicate P is satisfied, evaluate expression E (execute 
E), otherwise execute E' (which may be missing)". 

In this notation, the predicate P corresponds to a geometric 
pattern or configuration, and the expressions E and E* to the esta¬ 
blishment or removal of links. 

In SEK, this part is handled by LISP functions (hand-coded), 
one for each particular heuristic. The suggestion is to develop this 
general notation, and an interpreter for It. This will speed up 
programming and checking, but will slow down the execution to 
some extent. 




Use 

. The main use of the new notation or language is for trying 

new heuristics. Actually, it is not difficult to hand-code the 
new heuristic in LISP (see function EVERTICES in listings), because 
everything reduces to calls to NOSABO, THROUGHTES, GEV, SUMS, etc. 

I was thinking that a simple MACRO of Lisp could transform from no¬ 
tation (WHEN PEE') to LISP functional calls. 


Since what the notation or language is really doing is expressing 
as a linear string a two-dimensional configuration , a more am¬ 


bitious project would be to use the light pen and draw this configuration, 
and then have our interpreter or compiler produce the LISP program. 
This may look a little like AMBIT-G {Christensen}. 
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Assigning a name to an object 


Problem . SEE has separated a scene Into bodies. What are they? 

Is there a pyramid among them? Where are the parallelepipeds? 

To answer this, information can be supplied to the program. In 
the form of a symbolic description or model of the object we are 
trying to find. A model is an idealized account of a class of objects, 
all receiving the same name, like "triangular pyramid" or "house". 
Models may have parameters that acquire values after a given instance 
of the model has been found in a scene. Examples are "height" or 
"length of bottom side". 

Some programs that follow the above procedure to name objects 
in a scene are described and discussed in a Master's Thesis {Guzman}. 
There are difficult problems to be solved if we are to make the 
system able to recognize occluded objects in many situations. 

One could, of course, bypass SEE and look for particular objects, 
as it is done by Polybrick {Hawaii 69}, a program that finds paralle¬ 
lepipeds. 






Do not use over-specialized assumptions. U se more information 

~~ In 

trying to solve a problem, people will apply quite different methods. 
They may also suppose quite different assumptions, some of which 
may not hold. Due to particular experience, environment, preferen¬ 
ces, etc., some subjects may be using over-specialized assumptions, 
instead of requesting more data, more information to solve the 
problem. We may bias our views and risk arriving at conclusions 
(of the "common sense" type) which are valid only on restricted 
segments of populations, or in particular conditions or situations. 

Holes. For instance, if most of the readers of this thesis [technical 
specialists, who have learned to read, are interested in graphical 
processing and computers, etc; who may not be considered a repre¬ 
sentative cross-section of Homo Sapiens] perceive "objects" a, b 
and c of' figure 'HOLES' as holes {Winston}, we may be tempted to 
conclude that this is a general property, and rush to write a 



Fig. 'HOLES' 


The ideo* that objects a, b, c 
have to be interpreted by all 
men, and hence by a program, as 
holes in the larger box, is 
dangerous, {cf. AI Memo 163} 


subroutine to find such orifices. Perhaps other sectors of our 
population would simply say, with respect to a, b, c, of figure 
HOLES' that "there is not enough information to make a decision" 
(see also section 'On optical illusions'). Or they may come with 
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different 
different 
The Ames* 
of this. 


answers, using their set of assumptions which may be 
from ours, since their experience is different too. 
Room (see Box, page 1,01 ) and Gregory (see Box) warn us 


Other example of over-specialisation 

. .— . . — . For people familiar with 

Descriptive Geometry, it is easy to see that figure 'DESCRIPTIVE' (I) 

shows a straight line in the first octant. For them, indeed, it 

is easy to visualize this line in three dimensions and have a fairly 

good idea of its position and orientation in space, just from 

figure (I). 

Other persons would need a more conventional figure, such as 
figure 'DESCRIPTIVE' (II), to visualize the same line, to get the 
same idea. 

What happened was that the first group of persons were using 
especialized knowledge, their mind were trained, figure (I) was 



Conclusion 


Before looking for heuristics and 
assumptions, deductions, etc., let us be sure 
data to solve our problem. 


shortcuts, before making 
that there is enough 
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Human perception versus computer perception „. 

. I, 5n Given a two-dimensional 

line-drawing of a three-dimensional scene, the problem of finding 

bodies in it is inherently ambiguous! many 3-dim scenes can generate 

the same 2-dim scene. 

Multiple solutions are possible. More over, the metatheorem 
of page yf guarantees that a solution always exists, and provides 
ways to construct It. We call this solution "trivial''} in effect It 
is trivial to write a computer program that will invariably find it. 

From the multitude of possible solutions, human beings select 
one, which is * different from the trivial, and call it "normal" 
or "common" or "standard" or "reasonable" interpretation of the 
scene. 

Our program SEE also selects one of the many solutions. 

How does its selection compare with the human choice? 

**= When the scene is "clear", in the sense of evoking human 

unanimity, SEE will * also select that same answer. Example: 
Figure 'TOWER'. 

“ As the scene or drawing gets complicated or ambiguous, mortal 

behavior deteriorates; opinions split, optical illusions may giwnje 
(indicating contradictory evidence perceived), several 
plausible answers are emitted. 

The answer of SEE in these cases will * be found among the 
humanly plausible selections. In some cases, it may not agree 
with the majority. 

— Finally, people make mistakes. They will see an object that is 
not there, or will fail to see an object, or classify it as 
"impossible". 

But SEE also errs. It sometimes succeeds where people fail, 
more often it is the other way around. 


In an overwhelming majority of cases. 
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TABLE 


"ASSUMPTIONS" 


ASSUMPTIONS MADE BY THE PROGRAM 


These assumptions have to be obeyed for SEE to give good resultst 

mm The objects are three-dimensional solids formed by planes ^ 
No needles or cardboards allowed. 

mm They produce a two-dimensional image or projection where all 

( 2 ) 

lines are straight 

■> Paces have no drawings, marks, labels, etc., imprinted on. 

“ Objects do not have holes in them. 


See section 'On optical illusions' for conditions for partial 
lifting of this assumption. 

See section 'On curved objects' for conditions for partial lifting 
of this assumption. 


ASSUMPTIONS NOT MADE BY THE PROGRAM 

These assuiq>tlon8 are not necessary for the correct functioning of SEE; 
it will work well with or without them. 

»■* Only prisms are allowed. 

mm The scene is a parallel projection, or isometric drawing. 

— The objects are convex. 

— The model or description of the object has to be known to SEE. 

mm The objects have to appear unoccluded or unobstructed in the view. 

mm The objects have "weight" in the vertical direction and will 
fall if not supported. 

mm The background is known in advance (See 'On background discrimi¬ 
nation by computer'). 

I repeat, these assumptions are NOT obeyed by our program. 
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ANNOTATED LISTING OF THE FUNCTIONS USED 


You do not have to know these things in order to use SEE (ree¬ 
ding 'How to use the program' in page 1% is enough) or to understand 
what it does (it is explained in 'SEE, a program that finds bodies in 
a scene', page Eg); these things are put here merely for completeness 
and to make easier the understanding of the inner workings of SEE. 


A listing is a formal description _ 

. . 1 ■ —. — ■ ■■■■■■ i There is a stronger reason, 

however. A listing of the programs is a formal description, an 
algorithm, an exact statement in a formal language of what we may 
have been describing, perhaps inaccurately, in a natural language 
(English). It becomes the starting point of serious discussions. 
The reader who is skeptical at some point, or did not understand 
some English statement, can always clarify his doubts in the listing. 
To be understandable, the listing has to have annotations, comments. 


A mathematician is not formed to explain his work always in na¬ 
tural language, but rather he is allowed to employ abstract notations, 
symbolisms, fonnftli»tions of hi£ thoughts (indeed, it is preferable 
this way). A programmer should not hide his listings (he should not 
be forced to re*state his algorithms in natural language exclusively 
{ 68}) and force his readers to use the ambiguous channels 

of his natural language communication. 


And this brings another point. Not only a progranawr should not 
hide the listing (unless there are^nigs or incomplete subroutines), 
but, he should not, hide the p rogra ms (unless they are banal); by this 
I mean honest and reasonable efforts should be made to facilitate f« 
ture potential users the access to these programs. Include: 

” Documentation 

“■ Listings, tape or card deck names, etc. 

“ Test data 

“ Frintout of an interaction with such test data, 

including loading, compilation, execution, results. 

" Time spent (by machine and by man). 

See also R. Rain's letter {C. ACM March 67}. 
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